find() API, aggregation can compute totals, averages, transformations, and joins — making it the backbone of analytics and reporting.
// Basic pipeline structure
db.orders.aggregate([
// Stage 1: Filter
{ $match: { status: "completed", createdAt: { $gte: ISODate("2026-01-01") } } },
// Stage 2: Group and compute totals
{ $group: {
_id: "$customerId",
totalSpent: { $sum: "$amount" },
orderCount: { $sum: 1 },
avgOrder: { $avg: "$amount" }
}},
// Stage 3: Add computed field
{ $addFields: {
tier: { $cond: { if: { $gte: ["$totalSpent", 10000] }, then: "VIP", else: "Regular" } }
}},
// Stage 4: Sort
{ $sort: { totalSpent: -1 } },
// Stage 5: Limit top 10
{ $limit: 10 }
]);
Why it matters: The aggregation pipeline is the most-tested advanced MongoDB topic. Interviewers assess your ability to chain stages logically and use the right operator for each transformation.
Real applications: Sales dashboards (total revenue per customer), user analytics (active users per day), inventory reports (low-stock products by category).
Common mistakes: Not putting $match first — placing it early reduces data flowing through all subsequent stages. Late $match stages miss index optimization opportunities.
_id expression (the grouping key) and applies accumulator operators to compute values. Each unique grouping key value produces one output document. Setting _id: null groups all documents together for a total. Key accumulators: $sum (total), $avg (average), $min/$max (extremes), $count (document count, MongoDB 5.0+), $push (array of values), $addToSet (unique array), $first/$last (first/last value in group), $stdDevPop/$stdDevSamp (standard deviation).
// Group by product category
db.products.aggregate([{
$group: {
_id: "$category", // group by category field
count: { $sum: 1 }, // count documents
totalStock: { $sum: "$stock" },
avgPrice: { $avg: "$price" },
maxPrice: { $max: "$price" },
minPrice: { $min: "$price" },
allBrands: { $addToSet: "$brand" }, // unique brands array
firstAdded: { $first: "$createdAt" }
}
}]);
// Group all documents (grand total)
db.orders.aggregate([{
$group: {
_id: null, // null = all documents
grandTotal: { $sum: "$amount" },
orderCount: { $count: {} } // MongoDB 5.0+
}
}]);
// Multi-field grouping key
db.sales.aggregate([{
$group: {
_id: { year: { $year: "$saleDate" }, category: "$category" },
revenue: { $sum: "$amount" }
}
}]);
Why it matters: $group is the core of all aggregation queries. Mastering accumulators and grouping key expressions is essential for any data analytics or reporting task.
Real applications: Monthly revenue reports, user activity summaries (login count per user per month), inventory valuation (total stock value per warehouse).
Common mistakes: Forgetting that $group does not guarantee output order — always add $sort after $group if order matters.
from (foreign collection), localField, foreignField, and as (output array field). MongoDB 3.6+ added expressive $lookup using a pipeline, enabling complex multi-condition joins, computed fields, and nested lookups. Non-matching documents get an empty array.
// Basic $lookup — join orders with users
db.orders.aggregate([{
$lookup: {
from: "users", // foreign collection
localField: "userId", // field in orders
foreignField: "_id", // field in users
as: "userDetails" // output field name (array)
}
}]);
// Result: each order document gets "userDetails": [{ ... user doc ... }]
// Unwind the array (if one-to-one)
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $unwind: "$user" }, // flatten array to single object
{ $project: { "user.password": 0 } } // exclude sensitive fields
]);
// Expressive $lookup (pipeline) — join with conditions
db.orders.aggregate([{
$lookup: {
from: "products",
let: { orderItems: "$items" },
pipeline: [
{ $match: { $expr: { $in: ["$_id", "$$orderItems"] } } },
{ $project: { name: 1, price: 1 } }
],
as: "productDetails"
}
}]);
Why it matters: $lookup is critical for working with normalized data models. Tests knowledge of MongoDB's join capabilities and when to denormalize vs join.
Real applications: Order detail views joining order items with product info, user activity feeds joining events with user profiles, comment threads joining comments with author details.
Common mistakes: Not indexing foreignField — $lookup performs a collection scan on the foreign collection per input document without an index, making it extremely slow.
$unwind produces 5 documents, each with one array element. This is essential before performing $group or $sort on array contents. By default, documents with missing or empty array fields are excluded — use preserveNullAndEmptyArrays: true to keep them. The includeArrayIndex option adds a field with the array element's original index position.
// Input: { _id: 1, tags: ["mongodb", "database", "nosql"] }
db.posts.aggregate([{ $unwind: "$tags" }]);
// Output:
// { _id: 1, tags: "mongodb" }
// { _id: 1, tags: "database" }
// { _id: 1, tags: "nosql" }
// Count tags across all posts
db.posts.aggregate([
{ $unwind: "$tags" },
{ $group: { _id: "$tags", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
]);
// Preserve documents with missing/empty arrays
db.products.aggregate([{
$unwind: {
path: "$reviews",
preserveNullAndEmptyArrays: true, // keep products with no reviews
includeArrayIndex: "reviewIndex" // add index field
}
}]);
// Unwind nested array
db.orders.aggregate([
{ $unwind: "$items" },
{ $group: { _id: "$items.productId", totalSold: { $sum: "$items.qty" } } }
]);
Why it matters: Understanding array deconstruction is crucial for analytics on array fields. $unwind enables per-element aggregations that would otherwise be impossible.
Real applications: Tag frequency analysis, sales per product SKU from order item arrays, skill distribution analysis from user skill arrays.
Common mistakes: Unwinding without preserveNullAndEmptyArrays silently drops documents with missing arrays, causing incorrect totals in reports.
find() projections, $project in aggregation supports computed fields using expressions, string operations, arithmetic, date functions, and conditional logic. Use 1 to include, 0 to exclude. Computed fields use aggregation expressions with the $ prefix for field references. $addFields is a variant that only adds new fields without removing existing ones.
db.employees.aggregate([{
$project: {
// Include fields
name: 1,
email: 1,
// Exclude sensitive fields
password: 0,
ssn: 0,
// Rename field
fullName: "$name",
// Computed: annual salary
annualSalary: { $multiply: ["$monthlySalary", 12] },
// String concatenation
displayName: { $concat: ["$firstName", " ", "$lastName"] },
// Date parts
joinYear: { $year: "$joinedAt" },
// Conditional
status: {
$cond: { if: { $gte: ["$experience", 5] }, then: "Senior", else: "Junior" }
},
// Nested field rename
"contact.primary": "$phone"
}
}]);
// $addFields — add without removing existing
db.users.aggregate([{
$addFields: {
fullName: { $concat: ["$firstName", " ", "$lastName"] },
ageGroup: {
$switch: {
branches: [
{ case: { $lt: ["$age", 25] }, then: "Young" },
{ case: { $lt: ["$age", 40] }, then: "Middle" }
],
default: "Senior"
}
}
}
}]);
Why it matters: $project is used in virtually every pipeline. Tests ability to compute derived fields and reshape documents for downstream stages or API responses.
Real applications: Computing profit margins from revenue/cost fields, creating display names from firstName/lastName, formatting dates into readable strings.
Common mistakes: Using $project to add fields while also excluding others — mixing inclusion/exclusion in aggregation $project is allowed, unlike find() projections.
// $facet — multiple pipelines in one query
db.products.aggregate([{
$facet: {
// Pipeline 1: Category breakdown
"byCategory": [
{ $group: { _id: "$category", count: { $sum: 1 } } },
{ $sort: { count: -1 } }
],
// Pipeline 2: Price distribution
"priceRanges": [
{ $bucket: {
groupBy: "$price",
boundaries: [0, 1000, 5000, 20000, 100000],
default: "Other",
output: { count: { $sum: 1 }, avgPrice: { $avg: "$price" } }
}}
],
// Pipeline 3: Rating stats
"ratingStats": [
{ $group: { _id: null, avgRating: { $avg: "$rating" }, total: { $sum: 1 } } }
]
}
}]);
// $bucketAuto — auto-generate 5 equal buckets
db.products.aggregate([{
$bucketAuto: {
groupBy: "$price",
buckets: 5,
output: { count: { $sum: 1 }, avgPrice: { $avg: "$price" } }
}
}]);
Why it matters: $facet is the MongoDB way to build e-commerce-style faceted search. This is a high-value interview topic showing advanced aggregation knowledge.
Real applications: E-commerce product search with simultaneous category counts, price ranges, and brand filters — all computed in one database round-trip.
Common mistakes: Running separate aggregation queries for each facet — $facet does all of them in a single pass through data, which is significantly more efficient.
let option exposes local document fields as variables (prefixed with $$) inside the pipeline. The pipeline runs against the foreign collection with access to both the local variables and pipeline operators. This enables filtered joins, computed joins, and nested lookups that are impossible with basic $lookup.
// Expressive lookup — join with multiple conditions
db.orders.aggregate([{
$lookup: {
from: "promotions",
let: { orderTotal: "$total", orderDate: "$createdAt", custId: "$customerId" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $lte: ["$minOrderValue", "$$orderTotal"] }, // promo threshold met
{ $lte: ["$startDate", "$$orderDate"] }, // promo active
{ $gte: ["$endDate", "$$orderDate"] }
]
}
}
},
{ $project: { code: 1, discount: 1 } } // only return needed fields
],
as: "eligiblePromos"
}
}]);
// Self-join — employees with their managers
db.employees.aggregate([{
$lookup: {
from: "employees",
let: { managerId: "$managerId" },
pipeline: [
{ $match: { $expr: { $eq: ["$_id", "$$managerId"] } } },
{ $project: { name: 1, email: 1 } }
],
as: "manager"
}
}, {
$unwind: { path: "$manager", preserveNullAndEmptyArrays: true }
}]);
Why it matters: Expressive $lookup is a senior-level feature. It demonstrates mastery of MongoDB's join capabilities and shows you can handle complex data relationships without application-level joins.
Real applications: Eligibility checks (which promotions apply to an order), hierarchical data (employee-manager trees), product recommendations based on order history and category.
Common mistakes: Not indexing the foreign collection's match fields within the pipeline — each input document triggers a pipeline scan on the foreign collection without proper indexes.
$match and find(). It enables comparing two fields in the same document, applying aggregation expressions in queries, and using variables. Without $expr, MongoDB query operators only compare a field against a literal value. With $expr, you can write { $gt: ["$fieldA", "$fieldB"] } — comparing two document fields. It's especially powerful in $lookup pipelines for correlated queries.
// Find orders where actual amount exceeds budget
db.orders.find({
$expr: { $gt: ["$actualCost", "$budgetedCost"] }
});
// Use in $match stage
db.products.aggregate([
{ $match: {
$expr: {
$and: [
{ $gt: ["$price", "$cost"] }, // profitable
{ $lt: ["$stock", "$reorderPoint"] ] // needs reorder
]
}
}}
]);
// Compute with $expr
db.employees.find({
$expr: {
$gte: [
{ $multiply: ["$yearsExp", 10000] }, // computed value
"$salary"
]
}
});
// Variables in $expr ($$ROOT, $$CURRENT, $$NOW)
db.events.aggregate([{
$match: {
$expr: { $lt: ["$eventDate", "$$NOW"] } // already occurred
}
}]);
Why it matters: $expr bridges the gap between query language and aggregation expressions. It's essential for field-to-field comparisons, which are impossible with regular query operators.
Real applications: Finding overdue tasks (due date < current date), profitable products (price > cost), over-budget projects (actual > estimated).
Common mistakes: Trying to compare two document fields with regular operators: { $gt: ["$a", "$b"] } doesn't work outside $expr — MongoDB treats "$b" as a literal string.
$out (MongoDB 2.6+) completely replaces the target collection — if it exists, all existing documents are deleted and replaced with pipeline results. $merge (MongoDB 4.2+) is more flexible: it can insert, replace, merge, fail, or keep existing documents based on matching criteria. $merge supports upsert-like behavior for incrementally updating materialized views. Both must be the last stage in the pipeline and cannot be used in transactions.
// $out — replace collection entirely
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } },
{ $out: "customer_lifetime_value" } // replaces entire collection
]);
// $merge — upsert into existing collection
db.daily_sales.aggregate([
{ $group: { _id: "$date", revenue: { $sum: "$amount" }, count: { $sum: 1 } } },
{
$merge: {
into: "sales_summary", // target collection
on: "_id", // match key
whenMatched: "merge", // merge fields if doc exists
whenNotMatched: "insert" // insert if doc doesn't exist
}
}
]);
// whenMatched options:
// "replace" — replace entire document
// "merge" — merge incoming with existing
// "keepExisting" — keep existing, discard new
// "fail" — throw error on conflict
// pipeline — custom merge logic
Why it matters: $merge enables building materialized views and incremental aggregations — a critical pattern for pre-computing expensive analytics without full recalculation.
Real applications: Daily revenue summaries updated incrementally each night, product popularity scores updated as new orders arrive, user analytics tables for reporting dashboards.
Common mistakes: Using $out for incremental updates — it deletes all existing records first. Use $merge with whenMatched: "merge" for incremental views.
$project, they only specify what to add — all existing fields are retained automatically. Multiple fields can be added in a single stage. They support all aggregation expressions for computing values. $unset (alias: $project with 0) removes fields. Using $set/$unset is the idiomatic MongoDB 4.2+ style for field manipulation.
// $addFields / $set — add computed fields
db.products.aggregate([{
$set: {
// Add profit margin
margin: {
$multiply: [
{ $divide: [{ $subtract: ["$price", "$cost"] }, "$price"] },
100
]
},
// Add full name from parts
fullName: { $concat: ["$brand", " - ", "$model"] },
// Overwrite existing field with formatted version
price: { $round: ["$price", 2] },
// Conditional tag
inStock: { $gt: ["$stock", 0] },
// Current timestamp
processedAt: "$$NOW"
}
}]);
// $unset — remove fields
db.users.aggregate([
{ $unset: ["password", "ssn", "creditCard"] }
]);
// Chain $set stages for readability
db.employees.aggregate([
{ $set: { annualSalary: { $multiply: ["$monthlySalary", 12] } } },
{ $set: { taxBracket: { $switch: {
branches: [
{ case: { $gte: ["$annualSalary", 1000000] }, then: "30%" },
{ case: { $gte: ["$annualSalary", 500000] }, then: "20%" }
],
default: "10%"
}}}},
{ $unset: "monthlySalary" }
]);
Why it matters: $set/$addFields is the most readable way to enrich documents in pipelines. Shows awareness of MongoDB 4.2+ modernized syntax and the preference for readable stage names.
Real applications: Computing derived metrics (profit margins, engagement rates), adding display fields for API responses, and normalizing data during migration pipelines.
Common mistakes: Using $project when only adding fields — this requires explicitly listing all fields you want to keep, making pipelines verbose and fragile when schema changes.
$lookup followed by $unwind, the combined behavior is like an INNER JOIN — documents without matches are dropped because $lookup produces an empty array and $unwind removes documents with empty arrays. To simulate a LEFT JOIN (keep unmatched documents), use $unwind with preserveNullAndEmptyArrays: true. This is one of the most common patterns in MongoDB aggregation and is critical for understanding when documents might be accidentally lost in pipelines.
// INNER JOIN behavior (drops unmatched orders)
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $unwind: "$user" } // drops orders with no matching user!
]);
// LEFT JOIN behavior (keeps orders without user)
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $unwind: { path: "$user", preserveNullAndEmptyArrays: true } }
// unmatched orders: user = null
]);
// Check for orphaned records
db.orders.aggregate([
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $match: { user: { $size: 0 } } } // find orders with no user
]);
// Best practice: filter early, join late
db.orders.aggregate([
{ $match: { status: "pending", createdAt: { $gte: ISODate("2026-01-01") } } },
{ $lookup: { from: "users", localField: "userId", foreignField: "_id", as: "user" } },
{ $unwind: { path: "$user", preserveNullAndEmptyArrays: true } },
{ $project: { "user.password": 0 } }
]);
Why it matters: This is the most common data integrity issue in MongoDB aggregations. Accidentally dropping documents by not using preserveNullAndEmptyArrays causes silent incorrect report totals.
Real applications: Order reports must include all orders even if the user account was deleted. Product listings should show all products even those without reviews.
Common mistakes: In analytics, missing the preserveNullAndEmptyArrays flag causes totals to silently undercount — the most common cause of data discrepancies in MongoDB reports.
$graphLookup does this in a single aggregation pipeline stage.
// Employee hierarchy — find all reports under a manager
db.employees.aggregate([{
$graphLookup: {
from: "employees", // same collection (self-join)
startWith: "$_id", // start from this employee's ID
connectFromField: "_id", // match outgoing field
connectToField: "managerId",// field in other docs that points back
as: "allReports", // output array field
maxDepth: 5, // max recursion depth
depthField: "level" // add depth level to each result
}
}]);
// Category tree — find all subcategories
db.categories.aggregate([
{ $match: { name: "Electronics" } },
{
$graphLookup: {
from: "categories",
startWith: "$_id",
connectFromField: "_id",
connectToField: "parentId",
as: "allSubcategories",
maxDepth: 4
}
}
]);
// Friend-of-friend network (social graph)
db.users.aggregate([
{ $match: { _id: userId } },
{ $graphLookup: {
from: "users",
startWith: "$friends",
connectFromField: "friends",
connectToField: "_id",
as: "network",
maxDepth: 2 // friends of friends
}}
]);
Why it matters: Shows mastery of graph/tree data problems in MongoDB. This stage replaces multiple recursive application-level queries with a single efficient database operation.
Real applications: Org chart traversal, e-commerce category trees, bill of materials (product components), and social network friend graphs.
Common mistakes: Not setting maxDepth on circular graphs — this causes infinite loops. Also not indexing connectToField, making recursive lookups extremely slow on large collections.
$group which reduces N documents to 1, $setWindowFields adds computed fields to each document while considering surrounding documents. Key operators: $rank, $denseRank (ranking), $sum, $avg, $count (cumulative/windowed), $shift (access previous/next document), $derivative, $integral (rate of change), $first/$last (window boundaries).
// Running total (cumulative sum)
db.sales.aggregate([{
$setWindowFields: {
partitionBy: "$region", // group by region
sortBy: { saleDate: 1 }, // order within partition
output: {
runningRevenue: {
$sum: "$revenue",
window: { documents: ["unbounded", "current"] } // from start to current
},
rank: { $rank: {} }, // rank within region by saleDate
prevDayRevenue: {
$shift: { output: "$revenue", by: -1, default: 0 } // previous day
}
}
}
}]);
// Moving average (last 7 days)
db.metrics.aggregate([{
$setWindowFields: {
sortBy: { date: 1 },
output: {
movingAvg: {
$avg: "$value",
window: { documents: [-6, 0] } // current + 6 previous docs
}
}
}
}]);
// Dense rank for leaderboard
db.scores.aggregate([{
$setWindowFields: {
sortBy: { score: -1 },
output: { leaderboardRank: { $denseRank: {} } }
}
}]);
Why it matters: Window functions are a major MongoDB 5.0 feature that brings MongoDB on par with SQL analytics. Shows awareness of modern MongoDB capabilities for time-series and analytics workloads.
Real applications: Sales leaderboards with ranks, 7-day moving average metrics, cumulative revenue by region, day-over-day growth rates.
Common mistakes: Using $group + application-level ranking instead of $setWindowFields — this is far less efficient and requires multiple queries or complex application code.
$match and $limit as early as possible to reduce data volume flowing through stages. (2) $match + $sort at the start uses an index. (3) Use $project early to reduce document size. (4) Avoid $unwind before $group when possible (use $group with $push). (5) Use allowDiskUse: true for large aggregations exceeding 100 MB memory limit. (6) Use the explain() method to analyze pipeline performance.
// ❌ BAD: $match at end (processes all docs)
db.orders.aggregate([
{ $lookup: { ... } },
{ $unwind: "$items" },
{ $group: { ... } },
{ $match: { status: "completed" } } // too late!
]);
// ✅ GOOD: $match first (reduces docs early)
db.orders.aggregate([
{ $match: { status: "completed", createdAt: { $gte: startDate } } }, // uses index
{ $lookup: { ... } },
{ $project: { "items.price": 1, customerId: 1 } }, // reduce doc size
{ $unwind: "$items" },
{ $group: { ... } }
]);
// Allow disk use for large aggregations (>100MB)
db.bigCollection.aggregate([...], { allowDiskUse: true });
// Explain aggregation pipeline
db.orders.aggregate([...], { explain: true });
// or
db.orders.explain("executionStats").aggregate([...]);
// Use $indexStats to find unused indexes
db.orders.aggregate([{ $indexStats: {} }]);
Why it matters: Pipeline optimization is a key topic for senior roles and system design. Poorly ordered pipelines on large collections can be 100× slower than optimized ones.
Real applications: Nightly analytics jobs processing millions of documents must use allowDiskUse and carefully ordered stages to complete in reasonable time.
Common mistakes: Forgetting allowDiskUse: true for aggregations on large collections — MongoDB throws a memory exceeded error at 100 MB without it.
// Group sales by month and year
db.sales.aggregate([{
$group: {
_id: {
year: { $year: "$saleDate" },
month: { $month: "$saleDate" }
},
revenue: { $sum: "$amount" },
count: { $sum: 1 }
}
}, { $sort: { "_id.year": 1, "_id.month": 1 } }]);
// $dateToString — custom format
db.events.aggregate([{
$project: {
formattedDate: { $dateToString: { format: "%Y-%m-%d", date: "$createdAt" } },
dayOfWeek: { $dayOfWeek: "$createdAt" }, // 1=Sun, 7=Sat
hour: { $hour: "$createdAt" }
}
}]);
// $dateDiff — time-since-signup in days
db.users.aggregate([{
$addFields: {
accountAgeDays: {
$dateDiff: {
startDate: "$createdAt",
endDate: "$$NOW",
unit: "day"
}
}
}
}]);
// Group by week
db.orders.aggregate([{
$group: {
_id: { week: { $isoWeek: "$createdAt" }, year: { $isoWeekYear: "$createdAt" } },
total: { $sum: "$amount" }
}
}]);
Why it matters: Date aggregations are fundamental for business intelligence and analytics. Every production application needs date-based reporting — by day, week, month, hour.
Real applications: Daily active user charts, monthly revenue reports, peak hour analysis for capacity planning, and cohort analysis by signup week.
Common mistakes: Not accounting for timezone offsets — using $dateToString without a timezone option returns UTC dates, which may not match the user's local dates for business reporting.