MongoDB

Indexes & Performance

13 Questions

An index in MongoDB is a data structure (B-tree based) that stores a subset of a collection's data in an ordered form, enabling the database engine to quickly locate documents without scanning every document in the collection. Without an index, MongoDB performs a Collection Scan (COLLSCAN) — examining every document. With an index it performs an Index Scan (IXSCAN) — traversing the B-tree. For a collection with 1 million documents, a good index reduces query time from seconds to milliseconds. However, indexes consume disk space and slow down write operations (insert/update/delete must update the index too).
// Collection scan (no index) — reads ALL documents
db.users.find({ email: "mayur@example.com" });
// Execution plan: COLLSCAN

// Create index on email field
db.users.createIndex({ email: 1 }); // 1 = ascending

// Now uses index — reads only matching entry
db.users.find({ email: "mayur@example.com" });
// Execution plan: IXSCAN → FETCH

// View all indexes on a collection
db.users.getIndexes();

// Check if query uses index with explain()
db.users.find({ email: "mayur@example.com" }).explain("executionStats");
// Look for: "winningPlan.stage": "IXSCAN" (good)
//           "winningPlan.stage": "COLLSCAN" (bad)
// totalDocsExamined should equal nReturned for perfect index use

Why it matters: Indexes are the single most impactful performance optimization in MongoDB. Every backend interview includes at least one question about indexing strategy.

Real applications: Without an email index, login queries on a 10 million user database take seconds. With the index, the same query takes under 1ms.

Common mistakes: Adding indexes on every field "just in case" — each index increases write latency and memory usage. Only index fields used in frequent queries.

A compound index covers multiple fields in a single index, enabling efficient queries that filter or sort by those fields together. The order of fields matters — compound indexes follow the leftmost prefix rule: an index on {a, b, c} supports queries on a, a+b, or a+b+c, but not b alone or b+c. The ESR Rule is MongoDB's best practice for field ordering: Equality fields first, then Sort fields, then Range fields. This ordering maximizes index selectivity and avoids in-memory sorting.
// ESR Rule: Equality → Sort → Range

// Query: by status (equality), sort by date, filter date range
db.orders.find({
  status: "completed",          // E: equality
  createdAt: { $gte: startDate } // R: range
}).sort({ createdAt: -1 });      // S: sort

// CORRECT compound index order (ESR)
db.orders.createIndex({ status: 1, createdAt: -1 });
//                         E          S+R (same field)

// Another example
// Query: userId=X, sort by date, amount range
db.orders.find({ userId: uid, amount: { $gte: 100 } }).sort({ createdAt: -1 });

// ESR compound index
db.orders.createIndex({ userId: 1, createdAt: -1, amount: 1 });
//                         E           S              R

// Leftmost prefix support:
// { a: 1, b: 1, c: 1 } supports:
// - { a: x }               ✅
// - { a: x, b: y }         ✅
// - { a: x, b: y, c: z }   ✅
// - { b: y }               ❌ (no leftmost prefix coverage)
// - { a: x, c: z }         ✅ (but not ideal for c)

Why it matters: The ESR rule is a critical MongoDB interview topic. Understanding compound index field ordering separates developers who create useful indexes from those who accidentally create unused ones.

Real applications: E-commerce order filtering (by customer + status + date range) benefits massively from a correctly ordered compound index vs three separate single-field indexes.

Common mistakes: Putting range fields before sort fields in compound indexes — this causes MongoDB to use the range portion of the index but then do an in-memory sort, negating the sort optimization.

A multikey index is automatically created when you index a field that contains an array value. MongoDB creates index entries for each element of the array, allowing efficient queries for documents where the array contains a specific value. MongoDB automatically detects array fields and creates multikey indexes transparently. You cannot create a compound multikey index where more than one indexed field is an array. Querying array fields with conditions like $in, $all, $elemMatch all benefit from multikey indexes.
// Document with array field
{ _id: 1, tags: ["mongodb", "database", "nosql"], name: "Post 1" }
{ _id: 2, tags: ["mongodb", "aggregation"],       name: "Post 2" }
{ _id: 3, tags: ["sql", "database"],              name: "Post 3" }

// Create index on array field → automatically becomes multikey
db.posts.createIndex({ tags: 1 });
// MongoDB creates entries: "aggregation"→[2], "database"→[1,3], ...

// Queries that use multikey index
db.posts.find({ tags: "mongodb" });           // array contains "mongodb"
db.posts.find({ tags: { $in: ["mongodb", "sql"] } }); // $in with array
db.posts.find({ tags: { $all: ["mongodb", "nosql"] } }); // must contain both

// $elemMatch on array of objects
db.orders.createIndex({ "items.productId": 1, "items.qty": 1 });
db.orders.find({ items: { $elemMatch: { productId: "abc", qty: { $gte: 5 } } } });

// ❌ Cannot compound two array fields
db.posts.createIndex({ tags: 1, categories: 1 });
// Error: cannot index parallel arrays

Why it matters: Multikey indexes are the foundation of efficient array field queries. Tests understanding that array indexing is automatic but has compound limitations.

Real applications: Tag-based post search, product attribute filtering (find products where colors array contains "red"), role-based access queries on user roles array.

Common mistakes: Trying to create a compound index on two array fields — MongoDB throws an error. The workaround is to denormalize one array field or use separate queries.

A text index enables full-text search on string fields, supporting language-specific stemming (plural → base form), stop word removal (ignoring "the", "is"), and word tokenization. Only one text index is allowed per collection, but it can cover multiple fields. Use the $text operator with $search to query text indexes. Results come with a textScore (relevance score) that can be used for sorting. Text indexes support phrase search (quoted strings), phrase exclusion (minus prefix), and language specification (26 languages supported).
// Create text index on multiple fields
db.articles.createIndex({
  title: "text",
  body: "text",
  tags: "text"
}, {
  weights: { title: 10, tags: 5, body: 1 },  // title words rank higher
  name: "articles_text_idx"
});

// Full-text search query
db.articles.find(
  { $text: { $search: "mongodb aggregation performance" } },
  { score: { $meta: "textScore" } }  // add relevance score
).sort({ score: { $meta: "textScore" } }); // sort by relevance

// Phrase search (quotes)
db.articles.find({ $text: { $search: '"aggregation pipeline"' } });

// Exclude term (minus prefix)
db.articles.find({ $text: { $search: "mongodb -sql" } }); // mongodb but not sql

// Language-specific search
db.articles.find({ $text: { $search: "datos", $language: "es" } }); // Spanish

// Check text index via $text explain
db.articles.find({ $text: { $search: "mongodb" } }).explain("executionStats");

Why it matters: Text search is essential for search features. Tests understanding of MongoDB's native search capabilities and when to use them vs dedicated search engines like Elasticsearch.

Real applications: Blog post search, product catalog search by name/description, documentation search for internal tools.

Common mistakes: Using $regex for full-text search — regex does not use text indexes, doesn't support stemming, and causes full collection scans on large datasets.

Unique indexes enforce that no two documents have the same value for the indexed field(s). Attempting to insert a duplicate value throws a DuplicateKey error. Sparse indexes only include documents where the indexed field exists (skips null/missing), saving space for optional fields. Partial indexes (MongoDB 3.2+) index only documents matching a specified filter expression — more powerful and flexible than sparse indexes. Partial indexes can dramatically reduce index size for selective queries (e.g., only index documents where status = "active").
// Unique index — no duplicate emails
db.users.createIndex({ email: 1 }, { unique: true });

// Unique compound index (combo must be unique)
db.sessions.createIndex({ userId: 1, deviceId: 1 }, { unique: true });

// Sparse index — only index docs where phone field exists
db.users.createIndex({ phone: 1 }, { sparse: true });

// Partial index — only index active users
db.users.createIndex(
  { email: 1 },
  {
    partialFilterExpression: { isActive: true },
    unique: true
  }
);
// Only active users are checked for uniqueness — deleted users don't conflict

// Partial index on high-volume collection (index only recent orders)
db.orders.createIndex(
  { createdAt: -1, status: 1 },
  {
    partialFilterExpression: {
      status: { $in: ["pending", "processing"] }
    }
  }
);
// Much smaller index — only indexes 5% of orders that are still active

// Unique partial index — unique email among verified users only
db.users.createIndex(
  { email: 1 },
  { unique: true, partialFilterExpression: { isVerified: true } }
);

Why it matters: Partial indexes are a major optimization for large collections with skewed data distributions. They show advanced index design knowledge that separates senior from mid-level engineers.

Real applications: Order processing — index only pending/active orders (5% of collection) for dispatch system queries. User lookup — index only non-deleted users for login.

Common mistakes: Creating full collection indexes on fields queried only for a small subset of documents. A partial index on just those documents is orders of magnitude smaller and faster.

A TTL (Time-To-Live) index is a special single-field index on a Date type field that automatically deletes documents after a specified number of seconds have passed since the date stored in the field. MongoDB runs a background process every 60 seconds to find and delete expired documents. TTL indexes are perfect for session management, cache records, temporary tokens, and log data with automatic cleanup. You cannot create compound TTL indexes. The expiry logic is: document is deleted when current_time > document_date + expireAfterSeconds.
// Create TTL index — expire after 1 hour (3600 seconds)
db.sessions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 3600 }
);

// Documents are auto-deleted when current time > createdAt + 1 hour
db.sessions.insertOne({ userId: "123", token: "abc", createdAt: new Date() });
// ↑ Will be automatically deleted after 1 hour

// Expire at specific time (set expireAfterSeconds: 0)
db.scheduledJobs.createIndex(
  { executeAt: 1 },
  { expireAfterSeconds: 0 }
);
// ↑ Document is deleted at the exact time stored in executeAt

// TTL for logs — keep only last 30 days
db.app_logs.createIndex(
  { timestamp: 1 },
  { expireAfterSeconds: 30 * 24 * 3600 } // 30 days
);

// Update TTL index expiry period (MongoDB 4.4+)
db.runCommand({
  collMod: "sessions",
  index: { keyPattern: { createdAt: 1 }, expireAfterSeconds: 7200 }  // change to 2 hours
});

// TTL background process runs every 60 seconds (not real-time)

Why it matters: TTL indexes are the correct way to handle data expiry in MongoDB. They replace custom cleanup jobs with a built-in, zero-maintenance mechanism tested in most MongoDB interviews.

Real applications: Session tokens expiry (1 hour), OTP codes expiry (5 minutes), password reset links (24 hours), cache records (variable TTL), rate limiting records (1 minute).

Common mistakes: Expecting real-time deletion — TTL clean-up runs every 60 seconds, so documents may persist up to 60 seconds past expiry. Also using non-Date fields — TTL only works with Date type fields.

The explain() method returns detailed information about how MongoDB executes a query, including the winning plan (index chosen), stages traversed, documents examined, and execution time. It has three verbosity levels: "queryPlanner" (plan only, no execution), "executionStats" (runs query, returns stats), "allPlansExecution" (tests all candidate plans). Key metrics to check: IXSCAN vs COLLSCAN, totalDocsExamined/nReturned ratio (ideal = 1), totalKeysExamined, and executionTimeMillis.
// Basic explain
db.users.find({ age: { $gt: 25 }, city: "Mumbai" }).explain("executionStats");

// Key fields to inspect:
// executionStats.nReturned             — docs returned
// executionStats.totalDocsExamined     — docs scanned (should ≈ nReturned)
// executionStats.totalKeysExamined     — index entries checked
// executionStats.executionTimeMillis   — total time in ms
// queryPlanner.winningPlan.stage       — IXSCAN (good) or COLLSCAN (bad)

// Check aggregation pipeline performance
db.orders.explain("executionStats").aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
]);

// Covered query — all fields in index, no document fetch needed
db.users.createIndex({ email: 1, name: 1 });
db.users.find({ email: "test@example.com" }, { name: 1, _id: 0 })
  .explain("executionStats");
// Look for: "stage": "PROJECTION_COVERED" — never touches the document!

// "hint" — force a specific index
db.users.find({ email: "test@example.com" })
  .hint({ email: 1 })
  .explain("executionStats");

Why it matters: explain() is the primary tool for diagnosing slow queries. Every MongoDB performance optimization starts with explain(). Understanding how to read explain output is mandatory for backend roles.

Real applications: Identifying missing indexes on high-traffic API endpoints, detecting index bloat (too many keys examined vs docs returned), finding accidental full collection scans.

Common mistakes: Not checking the totalDocsExamined / nReturned ratio — a ratio of 1000:1 means examining 1000 docs to return 1, indicating a poor or missing index.

A covered query is one where ALL fields required by the query (filter and projection) are present in the index, allowing MongoDB to satisfy the entire query from the index alone without accessing any documents. The index "covers" the query completely. This is the most efficient query type because: (1) No document I/O — index is much smaller than collection, (2) Index is typically in RAM, (3) No document deserialization needed. To achieve covered queries, create a compound index that includes all filter fields and all projected fields. Documents' _id must also be excluded from projection (or included in the index) for a covered query.
// Create compound index covering filter + projection fields
db.users.createIndex({ email: 1, name: 1, role: 1 });

// Covered query — everything resolved from index (no document reads!)
db.users.find(
  { email: "mayur@example.com" },        // filter field: in index ✅
  { name: 1, role: 1, _id: 0 }          // projection fields: in index ✅, _id excluded ✅
);
// explain: "winningPlan.stage": "PROJECTION_COVERED"

// NOT covered — _id not in index and not excluded
db.users.find(
  { email: "mayur@example.com" },
  { name: 1, role: 1 }   // _id is returned by default but not in index → fetches doc
);

// NOT covered — 'age' not in index
db.users.find(
  { email: "mayur@example.com" },
  { name: 1, age: 1, _id: 0 }  // age not in index → fetches doc
);

// High-performance user lookup (auth check)
db.users.createIndex({ email: 1, passwordHash: 1, role: 1 });
db.users.find({ email: "user@example.com" }, { passwordHash: 1, role: 1, _id: 0 });
// ✅ Covered — login check never touches the document store!

Why it matters: Covered queries are the gold standard for query optimization. Showing knowledge of covered queries in interviews demonstrates deep understanding of index mechanics and performance engineering.

Real applications: Authentication queries on high-traffic login endpoints, autocomplete suggestions (name/email lookup), count queries on filtered subsets.

Common mistakes: Returning _id (default when not excluded) breaks covered queries because _id must be in the index or explicitly excluded with _id: 0.

Index selectivity measures how uniquely an index identifies documents — high selectivity means the index quickly narrows down to very few matching documents. Selectivity is measured by cardinality: unique values / total documents. High-cardinality fields (email, phone, userId) have excellent selectivity. Low-cardinality fields (status: "active"/"inactive", boolean, gender) have poor selectivity. MongoDB's query optimizer uses index selectivity to choose which index to use when multiple indexes are available. For compound indexes, high-selectivity fields should come first (after equality fields following the ESR rule).
// HIGH selectivity (good index candidates)
// email: 1M unique values in 1M docs → selectivity = 1.0
// userId: unique per document
// orderId: unique identifier
db.users.createIndex({ email: 1 }); // excellent! 

// LOW selectivity (poor standalone indexes)
// status: only "active", "inactive" → selectivity = 0.000002
// isVerified: only true/false → selectivity = 0.000001
db.users.createIndex({ isVerified: 1 }); // poor! 50% true, 50% false

// Better: partial index instead of low-selectivity full index
db.users.createIndex({ email: 1 }, {
  partialFilterExpression: { isActive: true }
});

// Check index effectiveness
db.users.find({ isVerified: true })
  .explain("executionStats");
// totalDocsExamined may still be 500,000 (half the collection!)

// MongoDB may prefer COLLSCAN over low-selectivity index
// for large result sets (collection scan is sometimes faster
// due to sequential I/O vs random index I/O)

Why it matters: Shows deep understanding of how the query optimizer makes decisions. Interviewers ask this to differentiate engineers who understand index mechanics vs those who blindly add indexes.

Real applications: Creating an index on status alone for 1M orders where 800K are "completed" provides minimal benefit — MongoDB scans 80% of the collection anyway.

Common mistakes: Creating indexes on boolean or enum fields with only 2-5 unique values — MongoDB often opts for a collection scan instead because the index has poor selectivity.

Wildcard indexes (MongoDB 4.2+) index all fields (or fields matching a pattern) in a collection, enabling ad-hoc queries on unpredictable or dynamic fields without knowing field names in advance. Perfect for collections with user-defined custom fields, JSON metadata, or flexible schemas. Create with "$**": 1 for all fields or "fieldPath.$**": 1 for a sub-document. Wildcard indexes cannot be unique, TTL, text, or hashed. They support equality, range, and sort operations but not queries that require knowing the structure upfront (like $exists on the wildcard path).
// Wildcard index on all fields
db.products.createIndex({ "$**": 1 });
// Now any field query uses this index:
db.products.find({ "specs.screenSize": "15 inch" });  // ✅ uses wildcard index
db.products.find({ "attributes.color": "red" });       // ✅ uses wildcard index
db.products.find({ "customFields.delivery": "express" }); // ✅ uses wildcard index

// Wildcard index on specific sub-document
db.products.createIndex({ "metadata.$**": 1 });
// Indexes all fields within metadata subdocument

// Wildcard index with exclusions
db.products.createIndex(
  { "$**": 1 },
  {
    wildcardProjection: {
      "price": 0,       // exclude price field
      "inventory": 0    // exclude inventory field
    }
  }
);

// Use case: user-defined product attributes
{
  "_id": 1,
  "name": "Custom Laptop",
  "attributes": {
    "cpu": "Intel i7",
    "RAM": "16GB",
    "storage": "512GB SSD",
    "customProp_xyz": "some_value"  // any custom field
  }
}

Why it matters: Wildcard indexes are essential for schema-flexible designs — a key MongoDB strength. Shows knowledge of a relatively newer feature that solves real-world problems with dynamic data.

Real applications: E-commerce product with custom attributes, user-defined form fields, IoT device telemetry where each device type has different metrics.

Common mistakes: Using wildcard indexes on large collections without understanding they index every field in every document — the index can become larger than the collection itself.

In MongoDB 4.2 and earlier, foreground index builds locked the entire database (blocking all reads and writes) for the duration of the build — fast but causes downtime. Background index builds ran concurrently with read/write operations but were slower. MongoDB 4.2+ eliminated this distinction — all index builds are now hybrid (optimized concurrent builds that hold a brief exclusive lock only at the start and end). The build status can be monitored with db.currentOp(). For Atlas and MongoDB 4.2+, use rolling index builds (build on each secondary first, then primary) to minimize production impact.
// MongoDB 4.2+: just createIndex (always hybrid/optimized)
db.users.createIndex({ email: 1 });  // non-blocking in foreground

// Monitor index build progress
db.currentOp({
  "command.createIndexes": { $exists: true }
});
// Shows: progress percentage, active phase, elapsed time

// MongoDB 4.0 and earlier — background flag
db.users.createIndex({ email: 1 }, { background: true });
// background: true = no writes blocked (slower, deprecated in 4.2)

// Rolling index build on replica set (best practice for large collections)
// 1. Build on each secondary independently (rs.stepDown first if needed)
// 2. Stepdown primary → elect new primary  
// 3. Build on old primary (now secondary)
// Result: cluster stays available throughout, minimal performance impact

// Drop and rebuild index
db.users.dropIndex({ email: 1 });
db.users.createIndex({ email: 1 });

// Create multiple indexes at once (faster than creating one by one)
db.users.createIndexes([
  { email: 1 },
  { createdAt: -1 },
  { status: 1, role: 1 }
]);

Why it matters: Production index management is a critical operational concern. Interviewers for senior/DevOps roles ask this to verify you can add indexes to live production systems without causing outages.

Real applications: Adding a missing index to a production collection with 100M documents requires rolling index build on each replica set member to avoid query performance degradation.

Common mistakes: Building indexes on the primary in MongoDB 4.0 or earlier without background: true — this locks the entire database, causing downtime for all users.

A hashed index stores hashed values of the indexed field instead of the actual values. MongoDB computes a hash of the field value using a consistent hash function and stores that in the index. Hashed indexes support only equality queries (cannot do range queries on hashed indexes). Their primary use case is as a shard key for sharded collections — hashed shard keys distribute data evenly across shards regardless of the original value's distribution pattern (e.g., monotonically increasing timestamps). They can also be used for obfuscating values in the index while still supporting equality lookup.
// Create hashed index
db.users.createIndex({ userId: "hashed" });

// Supports equality queries only
db.users.find({ userId: "user123" }); // ✅ uses hashed index
db.users.find({ userId: { $gt: "user100" } }); // ❌ range not supported

// Primary use: hashed shard key for even distribution
sh.shardCollection("ecommerce.orders", { _id: "hashed" });
// ObjectId values are hashed → even distribution across shards
// vs { createdAt: 1 } (range shard key) which creates write hotspots

// Comparison: Range vs Hashed shard key
// Range key (serial): [0,100) → shard1, [100,200) → shard2
// New orders all go to highest range shard → HOTSPOT ❌

// Hashed key: hash("order001") → shard3, hash("order002") → shard1
// Orders distributed evenly → NO hotspot ✅

// Compound hashed index (MongoDB 4.4+)
db.events.createIndex({ userId: 1, eventId: "hashed" });
// ↑ userId is range prefix, eventId is hashed

Why it matters: Hashed indexes are critical for sharding strategy — a key senior MongoDB topic. Understanding when to use hashed vs range shard keys prevents the classic write hotspot problem.

Real applications: Sharding user data by userId hash for even distribution, sharding event logs by _id hash to prevent sequential write bottlenecks.

Common mistakes: Using hashed shard keys when range queries on the shard key are common — hashed sharding requires broadcast queries for range scans, hitting all shards.

Index intersection is MongoDB's ability to use two or more indexes simultaneously to satisfy a single query, merging results from multiple index scans. Instead of scanning one compound index, MongoDB can intersect results from two single-field indexes. However, index intersection is generally less efficient than a well-designed compound index due to the overhead of merging index scan results. MongoDB's query optimizer compares compound index vs intersection cost and chooses the optimal plan. Understanding when MongoDB chooses intersection vs compound index helps design better index strategies.
// Two separate indexes
db.users.createIndex({ age: 1 });
db.users.createIndex({ city: 1 });

// Query filtering both fields
db.users.find({ age: { $gt: 25 }, city: "Mumbai" });
// MongoDB can:
// Option 1: Use compound index { age: 1, city: 1 } (best)
// Option 2: Intersect results from { age: 1 } and { city: 1 }

// Check explain to see if intersection is used
db.users.find({ age: { $gt: 25 }, city: "Mumbai" }).explain("executionStats");
// "stage": "AND_HASH" or "AND_SORTED" → index intersection happening

// When intersection helps:
// - No compound index exists
// - Multiple queries with different field combinations

// When compound index is better:
// - Known query patterns
// - Sort operations (compound supports sort, intersection doesn't)
// - Better selectivity when fields ordered correctly

// Best practice: prefer compound indexes for known query patterns
db.users.createIndex({ city: 1, age: 1 }); // better than intersection
db.users.find({ city: "Mumbai", age: { $gt: 25 } }).explain();
// "stage": "IXSCAN" (single compound scan, faster than intersection)

Why it matters: Index intersection knowledge shows understanding of query planner internals. Most production optimizations favor compound indexes over relying on intersection.

Real applications: When you inherit a database without compound indexes and cannot change them immediately — knowing MongoDB intersects indexes reassures you queries aren't doing full scans.

Common mistakes: Relying on index intersection as a design strategy — it works but is less efficient than compound indexes, especially for queries with sort operations.