MongoDB Fundamentals

Top 15 MongoDB fundamentals interview questions. Master collections, documents, BSON, ObjectId, replica sets, storage engines and core MongoDB concepts.

MongoDB

15 Questions

MongoDB is a NoSQL, document-oriented database that stores data in flexible, JSON-like BSON (Binary JSON) format instead of rows and columns. Unlike relational databases (MySQL, PostgreSQL), MongoDB does not require a fixed schema — each document in a collection can have different fields. Data is stored in collections (analogous to tables) and documents (analogous to rows), but documents can contain nested objects and arrays natively. MongoDB is designed for horizontal scalability, high availability, and flexible data modeling, making it ideal for applications with evolving data requirements.

// Relational DB (SQL)
SELECT * FROM users WHERE age > 25;

// MongoDB
db.users.find({ age: { $gt: 25 } });

// A MongoDB document (BSON/JSON)
{
  "_id": ObjectId("64a1b2c3d4e5f6789012345"),
  "name": "Mayur",
  "age": 30,
  "skills": ["MongoDB", "Node.js", "React"],
  "address": {
    "city": "Mumbai",
    "pincode": "400001"
  }
}

Why it matters: This is always the first question to gauge your understanding of NoSQL fundamentals. Interviewers want to know if you can articulate key differences like schema flexibility, horizontal scaling, and document model.

Real applications: Used by companies like LinkedIn, Forbes, and Uber for user profile systems, event logging, real-time analytics, and catalog management where data shapes vary per document.

Common mistakes: Developers often assume MongoDB always outperforms SQL. MongoDB is not ideal for complex multi-table joins or strict transactional workflows — relational databases handle those better.

BSON (Binary JSON) is the binary serialization format MongoDB uses to store and transfer documents. While JSON is human-readable text, BSON is an encoded binary format that is faster to traverse, supports more data types (like Date, ObjectId, Binary, Decimal128), and allows MongoDB to efficiently encode/decode documents. BSON documents can be up to 16 MB in size. MongoDB drivers convert BSON to the language's native types automatically, so developers work with JSON-like objects in code while the database stores BSON internally.

// BSON supports richer types than JSON
{
  "_id": ObjectId("64a1b2c3d4e5f6789012345"), // ObjectId type
  "name": "Mayur",
  "createdAt": ISODate("2026-04-05T00:00:00Z"), // Date type
  "salary": NumberDecimal("75000.50"),           // Decimal128
  "profilePic": BinData(0, "base64encodeddata"), // Binary
  "isActive": true
}

// Size limit: 16 MB per document
// Use GridFS for files larger than 16 MB

Why it matters: Understanding BSON shows depth of knowledge beyond surface-level MongoDB usage. Interviewers assess whether you know the 16 MB limit, supported data types, and how MongoDB handles dates and binary data.

Real applications: When storing timestamps, GridFS chunking for large files, geospatial coordinates (GeoJSON), and financial data using Decimal128 for precision.

Common mistakes: Using JavaScript Date as a string instead of ISODate breaks date queries. Also forgetting the 16 MB document limit when embedding large blobs.

In MongoDB, a database contains collections, and each collection holds multiple documents. A collection is analogous to a table in SQL but has no enforced schema — documents within a collection can have entirely different fields. A document is a BSON object (key-value pairs) analogous to a row. Documents are the core unit of storage and can contain nested documents and arrays, enabling rich, hierarchical data representation without joins. Collections are created implicitly when you first insert a document.

// Database: "ecommerce"
// Collection: "products"

// Document 1
{
  "_id": ObjectId("aaa"),
  "name": "Laptop",
  "price": 75000,
  "brand": "Dell",
  "specs": { "ram": "16GB", "storage": "512GB SSD" },
  "tags": ["electronics", "computers"]
}

// Document 2 — different shape is OK!
{
  "_id": ObjectId("bbb"),
  "name": "T-Shirt",
  "price": 499,
  "sizes": ["S", "M", "L", "XL"],
  "color": "blue"
}

Why it matters: Fundamental concept tested in every MongoDB interview. Understanding the schema-less nature is key to designing MongoDB data models.

Real applications: E-commerce catalogs where different product categories (electronics vs apparel) have completely different attributes but live in the same collection.

Common mistakes: Over-applying SQL thinking — trying to keep documents in a collection strictly uniform when MongoDB's flexibility is a feature, not a bug.

Every MongoDB document must have a unique _id field that acts as its primary key. If you don't provide one, MongoDB automatically generates a 12-byte ObjectId. An ObjectId consists of: a 4-byte Unix timestamp (creation time), 5-byte random value (unique per machine/process), and 3-byte incrementing counter. This makes ObjectIds globally unique across distributed servers. The _id field is automatically indexed and supports extremely fast lookups. You can use any unique value as _id (string, integer, UUID), but ObjectId is the default.

// ObjectId structure (12 bytes)
ObjectId("64a1b2c3d4e5f6789012345a")
//         └──────┘└──────────┘└────┘
//         timestamp  random   counter

// Extract timestamp from ObjectId
const id = new ObjectId("64a1b2c3d4e5f6789012345a");
console.log(id.getTimestamp()); // 2026-04-05T...

// Custom _id
db.users.insertOne({ _id: "user_mayur", name: "Mayur" });

// Find by _id (most efficient query)
db.users.findOne({ _id: ObjectId("64a1b2c3d4e5f6789012345a") });

Why it matters: Shows understanding of MongoDB's distributed identity generation and primary key mechanics. Critical for pagination, sorting by insertion order, and cross-shard uniqueness.

Real applications: Extracting creation time from ObjectId for time-series analysis without a separate timestamp field, and using ObjectId for cursor-based pagination.

Common mistakes: Passing ObjectId as a plain string in queries — you must wrap it: ObjectId("..."). Also comparing ObjectIds with == instead of .equals() in application code.

find() returns a cursor pointing to all documents matching the query, which you iterate over. findOne() returns the first matching document as an object (not a cursor) and stops searching after finding a match. find() is used when you need multiple results, while findOne() is optimized for single-document retrieval. Using findOne() is more efficient when you only need one document because MongoDB stops scanning after the first match. Both support projection as a second argument to include/exclude fields.

// find() — returns cursor, iterate all matches
const cursor = db.users.find({ city: "Mumbai" });
cursor.forEach(doc => console.log(doc));

// find() with projection (include only name, email)
db.users.find({ city: "Mumbai" }, { name: 1, email: 1, _id: 0 });

// findOne() — returns first match as object
const user = db.users.findOne({ email: "mayur@example.com" });
console.log(user.name); // Direct property access

// find() with limit, skip, sort
db.users.find({}).sort({ age: -1 }).skip(10).limit(5);

// Count matching documents
db.users.countDocuments({ city: "Mumbai" });

Why it matters: Tests understanding of cursor mechanics and performance implications. Using find() when you need only one result is wasteful — MongoDB keeps scanning beyond the first match.

Real applications: findOne() for auth lookups by email, find() for listing all orders for a user with pagination.

Common mistakes: Using find() expecting an array — it returns a cursor. Call .toArray() or iterate with .forEach(). Also not using projections, fetching entire documents when only 2 fields are needed.

MongoDB supports schema validation through JSON Schema using the $jsonSchema validator applied at the collection level. You define rules like required fields, data types, string patterns, and value ranges. Validation can be set to strict (reject invalid documents) or moderate (only validate existing documents). Schema validation was added in MongoDB 3.6 and enforced at the database level, providing a middle ground between full schema-less freedom and strict SQL-like enforcement. Mongoose ODM provides application-level schema validation for Node.js applications.

// Create collection with validation
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email", "age"],
      properties: {
        name: { bsonType: "string", minLength: 2 },
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
        },
        age: { bsonType: "int", minimum: 18, maximum: 120 },
        role: { enum: ["user", "admin", "moderator"] }
      }
    }
  },
  validationLevel: "strict",   // "strict" | "moderate"
  validationAction: "error"    // "error" | "warn"
});

// Insert violating document — throws error
db.users.insertOne({ name: "A", email: "bad", age: 15 });

Why it matters: Shows you understand MongoDB's evolution toward data integrity guarantees. This bridges the gap between pure NoSQL freedom and data reliability needed in production.

Real applications: Enforcing email format and age range for user registration services, ensuring required fields in financial transaction records.

Common mistakes: Forgetting that validationLevel: "moderate" only validates new/modified documents, not the existing ones during schema migration.

MongoDB via BSON supports a rich set of data types beyond standard JSON. Key types include: String (UTF-8), Integer (Int32/Int64), Double (64-bit float), Decimal128 (128-bit decimal for precision), Boolean, Date (milliseconds since epoch), ObjectId (12-byte unique identifier), Array, Embedded Document (nested object), Null, Regular Expression, Binary Data, Timestamp (internal MongoDB use), and Symbol. Choosing the correct type is critical for accurate queries, correct sorting, and storage efficiency.

{
  "_id": ObjectId("..."),               // ObjectId
  "name": "Mayur",                      // String
  "age": NumberInt(30),                 // Int32
  "salary": NumberDecimal("75000.50"),  // Decimal128
  "rating": 4.8,                        // Double
  "isActive": true,                     // Boolean
  "joinedAt": ISODate("2026-01-01"),    // Date
  "tags": ["dev", "mongodb"],           // Array
  "address": { "city": "Mumbai" },      // Embedded Doc
  "photo": BinData(0, "base64..."),     // Binary
  "pattern": /^mongo/i,                 // Regex
  "deletedAt": null                     // Null
}

Why it matters: Type choice affects query correctness, index efficiency, and storage size. Storing numbers as strings breaks range queries and sorting.

Real applications: Use Decimal128 for financial amounts, Date for timestamps (not string), Array for tags or multi-value fields.

Common mistakes: Storing dates as strings breaks $gte/$lte date range queries. Using Double for currency causes floating-point precision errors — always use Decimal128 for money.

MongoDB Atlas is the official fully-managed cloud database service by MongoDB Inc., available on AWS, Azure, and GCP. Atlas handles provisioning, patching, backups, monitoring, scaling, and security — you focus purely on application development. Self-hosted MongoDB (Community/Enterprise) is installed on your own servers or VMs, giving full control but requiring your team to manage all operations. Atlas provides additional services like Atlas Search (full-text search), Atlas Charts (visualization), Data API, Triggers, and Realm (mobile sync), making it far more than just a hosted database.

// Connecting to Atlas (Node.js / Mongoose)
const mongoose = require('mongoose');

const uri = process.env.MONGODB_URI;
// Atlas URI format:
// mongodb+srv://username:password@cluster.abc.mongodb.net/dbname

mongoose.connect(uri, {
  useNewUrlParser: true,
  useUnifiedTopology: true,
  serverSelectionTimeoutMS: 5000
}).then(() => console.log('Connected to Atlas'))
  .catch(err => console.error('Atlas connection error:', err));

// Atlas free tier: M0 — 512 MB storage, shared cluster
// Production: M10+ dedicated clusters

Why it matters: Most modern projects use Atlas for deployment. Understanding Atlas vs self-hosted shows production deployment awareness and knowledge of managed services trade-offs.

Real applications: Startups use Atlas M0 free tier for development, enterprises scale to M30+ dedicated clusters with automated backups, VPC peering, and PrivateLink.

Common mistakes: Storing Atlas connection strings in plain text in code instead of environment variables. Also forgetting to whitelist IP addresses in Atlas Network Access settings.

Sharding is MongoDB's mechanism for horizontal scaling — distributing data across multiple servers (shards) so no single server holds all data. Data is partitioned by a shard key into chunks and distributed across shards. Each shard is a replica set for high availability. The system includes mongos routers (query routers), config servers (store cluster metadata), and the shards themselves. Applications connect to mongos, which routes queries to the correct shard(s) transparently. Sharding enables MongoDB to handle petabyte-scale datasets by scaling out rather than up.

// Enable sharding on a database
sh.enableSharding("ecommerce");

// Shard a collection by a shard key
sh.shardCollection("ecommerce.orders", { userId: 1 });

// Hashed sharding — better distribution
sh.shardCollection("ecommerce.events", { _id: "hashed" });

// Architecture:
// Client App
//    ↓
// mongos (router) — 1+
//    ↓
// Config Servers (3 replica set nodes)
//    ↓
// Shard 1 RS  | Shard 2 RS | Shard 3 RS
// (P+2S+2A)   | (P+2S+2A)  | (P+2S+2A)

Why it matters: Sharding is a senior/advanced topic that tests your understanding of distributed database architecture. An examiner wants to know if you can design for scale from the start.

Real applications: MongoDB at eBay, Foursquare, LinkedIn uses sharding for billions of documents across hundreds of shards.

Common mistakes: Choosing a monotonically increasing shard key (like ObjectId) creates a "hotspot" — all writes go to one shard. Use hashed sharding or a high-cardinality key for even distribution.

A replica set is a group of MongoDB servers that maintain the same dataset, providing redundancy and high availability. A replica set has one primary (accepts all writes) and one or more secondaries (replicate from primary). If the primary fails, an automatic election promotes a secondary to primary within seconds. Replica sets also provide read scaling by allowing reads from secondaries. A typical production replica set has 3 members minimum to enable proper elections. MongoDB Atlas always provisions replica sets automatically for every cluster tier.

// Replica set architecture (3-node)
// Primary:   p1.example.com:27017 (handles writes)
// Secondary: p2.example.com:27017 (replicates data)
// Secondary: p3.example.com:27017 (replicates data)

// Connecting to a replica set
const uri = "mongodb://p1,p2,p3/?replicaSet=myRS&readPreference=primaryPreferred";

// Read preferences:
// primary         — reads from primary only (default)
// primaryPreferred — primary if available, else secondary
// secondary       — reads from secondaries only
// nearest         — lowest latency member

// Check replica set status
rs.status()
rs.isMaster()

Why it matters: Replica sets are the foundation of MongoDB's fault tolerance. Every production MongoDB setup must use replica sets, and understanding elections and read preferences is key for system design interviews.

Real applications: Netflix uses replica sets to ensure zero-downtime failover. Read-heavy analytics dashboards use secondary reads to offload the primary.

Common mistakes: Running a standalone MongoDB in production (no replica set) means no failover. Also not understanding that secondary reads may return stale data due to replication lag.

WiredTiger is the default storage engine for MongoDB since version 3.2, replacing the older MMAPv1. It provides document-level concurrency control (multiple writers can update different documents simultaneously), data compression (snappy by default, zlib optional), and checkpoint-based durability with a write-ahead log (journal). WiredTiger uses a B-tree + LSM hybrid approach. It supports both in-memory and persistent storage configurations. The compression and concurrency improvements over MMAPv1 make WiredTiger significantly more efficient in CPU usage and disk space.

// Check storage engine
db.serverStatus().storageEngine;
// { name: "wiredTiger", ... }

// WiredTiger collection options
db.createCollection("events", {
  storageEngine: {
    wiredTiger: {
      configString: "block_compressor=zlib"  // zlib for better compression
    }
  }
});

// WiredTiger cache size (50% of RAM by default)
// mongod --wiredTigerCacheSizeGB 4

// Cache hit ratio (should be > 95%)
db.serverStatus().wiredTiger.cache["pages read into cache"]

Why it matters: Shows deep operational knowledge. Interviewers for senior/DBA roles ask about storage engines to assess your ability to tune MongoDB at the infrastructure level.

Real applications: Tuning WiredTiger cache size is the first optimization step for high-traffic applications, ensuring the working set fits in RAM.

Common mistakes: Not monitoring cache eviction rates — if evictions are high, your working set doesn't fit in cache and performance degrades significantly.

MongoDB Community Server is the free, open-source edition covering all core database functionality. MongoDB Enterprise is the paid edition with additional features for large organizations: LDAP/Active Directory authentication, Kerberos authentication, field-level encryption, auditing (who ran which command), In-Memory storage engine, encrypted storage engine, and SNMP monitoring. Enterprise also comes with official MongoDB Inc. support SLAs. For most applications and startups, Community is sufficient. Enterprise is required for organizations with strict compliance, security, and regulatory requirements (HIPAA, SOC2, PCI-DSS).

// Enterprise-only: LDAP Authentication
security:
  authorization: enabled
  ldap:
    servers: "ldap.company.com"
    transportSecurity: tls
    authz:
      queryTemplate: "..."

// Enterprise-only: Field-Level Encryption
const clientEncryption = new ClientEncryption(mongoClient, {
  keyVaultNamespace: "encryption.__keyVault",
  kmsProviders: { aws: awsCredentials }
});

// Enterprise-only: Audit log
auditLog:
  destination: file
  filter: '{ atype: { $in: ["authenticate", "find"] } }'

Why it matters: Shows awareness of licensing, compliance, and security features that matter in enterprise environments. A common question in enterprise job interviews.

Real applications: Healthcare companies use Enterprise's field-level encryption for HIPAA compliance. Financial institutions use Enterprise auditing for SOC2 certification.

Common mistakes: Attempting to implement LDAP auth with Community Edition — those security features are strictly Enterprise-only.

A capped collection is a fixed-size, circular collection that automatically overwrites the oldest documents when the size limit is reached, maintaining insertion order. Capped collections are ideal for high-throughput logging, event streams, and caches where you need the most recent N records. They support very fast writes because documents are always appended at the end. Capped collections do not support deletions or document growth (updates that increase size are rejected). They support natural order queries (fastest possible) and tailable cursors (like UNIX tail -f) for real-time data streaming.

// Create capped collection (1MB, max 1000 docs)
db.createCollection("app_logs", {
  capped: true,
  size: 1048576, // 1 MB (required, in bytes)
  max: 1000      // optional max document count
});

// Insert works normally
db.app_logs.insertOne({
  level: "ERROR",
  message: "DB connection failed",
  timestamp: new Date()
});

// Tailable cursor — streams new documents in real-time
const cursor = db.app_logs.find({}, { tailable: true, awaitData: true });
cursor.forEach(doc => console.log(doc)); // stays open, waits for new docs

// Check if collection is capped
db.app_logs.isCapped(); // true

Why it matters: Tests knowledge of specialized collection types. Capped collections are a go-to solution for logging use cases and show understanding of trade-offs between functionality and performance.

Real applications: Activity feeds keeping last 500 events, application error logs, audit trails with bounded storage, and real-time dashboards using tailable cursors.

Common mistakes: Trying to delete documents from a capped collection (not allowed except drop()). Also not pre-allocating enough size — once the limit is hit, old data is gone forever.

GridFS is a MongoDB specification for storing and retrieving files larger than the 16 MB BSON document limit. GridFS splits files into chunks (default 255 KB each) and stores them in two collections: fs.files (file metadata: name, size, uploadDate, contentType) and fs.chunks (binary data chunks with file reference). When reading, GridFS reassembles chunks into the original file. It supports streaming, range queries (seek to any byte position), and works with any file type. Most drivers provide GridFS built-in. For small files (< 16 MB), storing as BinData directly in a document is more efficient.

// Node.js GridFS with Mongoose (gridfs-stream)
const mongoose = require('mongoose');
const Grid = require('gridfs-stream');

const conn = mongoose.connection;
let gfs;
conn.once('open', () => {
  gfs = Grid(conn.db, mongoose.mongo);
  gfs.collection('uploads');
});

// Upload using GridFSBucket (modern API)
const { GridFSBucket } = require('mongodb');
const bucket = new GridFSBucket(db, { bucketName: 'images' });

// Upload a file
fs.createReadStream('./photo.jpg')
  .pipe(bucket.openUploadStream('photo.jpg', {
    metadata: { userId: '123', contentType: 'image/jpeg' }
  }))
  .on('finish', () => console.log('Upload complete'));

// Download a file
bucket.openDownloadStreamByName('photo.jpg')
  .pipe(res); // pipe to HTTP response

Why it matters: Shows knowledge of MongoDB's file handling capabilities. Commonly asked when discussing file upload architectures and whether to store files in MongoDB vs S3/CDN.

Real applications: Storing user-uploaded profile images, PDF reports, and video recordings when a separate object storage service is not desired.

Common mistakes: Using GridFS for small files (< 1 MB) when BinData embedded in a document is simpler and faster. GridFS shines for large files that need streaming.

MongoDB's journal is a write-ahead log (WAL) that records all operations before they are applied to the data files. This ensures that if MongoDB crashes mid-operation, the journal can be replayed on restart to restore the database to a consistent state. By default, MongoDB flushes the journal to disk every 50–100 milliseconds. With writeConcern: { j: true }, a write is acknowledged only after it has been persisted to the journal on disk — providing the highest level of single-node durability. WiredTiger uses checkpoints (60-second intervals by default) combined with the journal for full crash recovery.

// Write concern options
// j: true  — wait until write is journaled (durable)
// j: false — acknowledge when in memory (faster, risky)

db.orders.insertOne(
  { item: "laptop", qty: 5 },
  { writeConcern: { w: 1, j: true, wtimeout: 5000 } }
);

// w: 1         — acknowledged by primary
// w: "majority" — acknowledged by majority of replica set
// j: true      — persisted to journal on disk

// Check journaling status
db.serverStatus().dur

// Checkpoint interval (default 60s)
// mongod --syncdelay 60

Why it matters: Critical for understanding MongoDB's ACID guarantees and durability. Interviewers ask this to test your understanding of data safety in production systems.

Real applications: Financial transaction systems use w: "majority", j: true to guarantee writes are durable across replica set members before confirmation.

Common mistakes: Using j: false in production for performance without understanding the risk — a server crash before the journal flush can silently lose confirmed writes.

Next CRUD Operations

MongoDB Fundamentals

1What is MongoDB and how does it differ from relational databases?

2What is BSON and why does MongoDB use it?

3What are collections and documents in MongoDB?

4What is the _id field and ObjectId in MongoDB?

5What is the difference between find() and findOne()?

6How does MongoDB handle schema validation?

7What are the main data types supported in MongoDB?

8What is a MongoDB Atlas and how does it differ from self-hosted MongoDB?

9Explain MongoDB's horizontal scaling (sharding) at a high level.

10What is a replica set in MongoDB?

11What is the WiredTiger storage engine?

12What is the difference between MongoDB Community and Enterprise?

13What is a capped collection in MongoDB?

14What is GridFS and when should you use it?

15How does MongoDB ensure data durability with the journal?