// Relational DB (SQL)
SELECT * FROM users WHERE age > 25;
// MongoDB
db.users.find({ age: { $gt: 25 } });
// A MongoDB document (BSON/JSON)
{
"_id": ObjectId("64a1b2c3d4e5f6789012345"),
"name": "Mayur",
"age": 30,
"skills": ["MongoDB", "Node.js", "React"],
"address": {
"city": "Mumbai",
"pincode": "400001"
}
}
Why it matters: This is always the first question to gauge your understanding of NoSQL fundamentals. Interviewers want to know if you can articulate key differences like schema flexibility, horizontal scaling, and document model.
Real applications: Used by companies like LinkedIn, Forbes, and Uber for user profile systems, event logging, real-time analytics, and catalog management where data shapes vary per document.
Common mistakes: Developers often assume MongoDB always outperforms SQL. MongoDB is not ideal for complex multi-table joins or strict transactional workflows — relational databases handle those better.
// BSON supports richer types than JSON
{
"_id": ObjectId("64a1b2c3d4e5f6789012345"), // ObjectId type
"name": "Mayur",
"createdAt": ISODate("2026-04-05T00:00:00Z"), // Date type
"salary": NumberDecimal("75000.50"), // Decimal128
"profilePic": BinData(0, "base64encodeddata"), // Binary
"isActive": true
}
// Size limit: 16 MB per document
// Use GridFS for files larger than 16 MB
Why it matters: Understanding BSON shows depth of knowledge beyond surface-level MongoDB usage. Interviewers assess whether you know the 16 MB limit, supported data types, and how MongoDB handles dates and binary data.
Real applications: When storing timestamps, GridFS chunking for large files, geospatial coordinates (GeoJSON), and financial data using Decimal128 for precision.
Common mistakes: Using JavaScript Date as a string instead of ISODate breaks date queries. Also forgetting the 16 MB document limit when embedding large blobs.
// Database: "ecommerce"
// Collection: "products"
// Document 1
{
"_id": ObjectId("aaa"),
"name": "Laptop",
"price": 75000,
"brand": "Dell",
"specs": { "ram": "16GB", "storage": "512GB SSD" },
"tags": ["electronics", "computers"]
}
// Document 2 — different shape is OK!
{
"_id": ObjectId("bbb"),
"name": "T-Shirt",
"price": 499,
"sizes": ["S", "M", "L", "XL"],
"color": "blue"
}
Why it matters: Fundamental concept tested in every MongoDB interview. Understanding the schema-less nature is key to designing MongoDB data models.
Real applications: E-commerce catalogs where different product categories (electronics vs apparel) have completely different attributes but live in the same collection.
Common mistakes: Over-applying SQL thinking — trying to keep documents in a collection strictly uniform when MongoDB's flexibility is a feature, not a bug.
// ObjectId structure (12 bytes)
ObjectId("64a1b2c3d4e5f6789012345a")
// └──────┘└──────────┘└────┘
// timestamp random counter
// Extract timestamp from ObjectId
const id = new ObjectId("64a1b2c3d4e5f6789012345a");
console.log(id.getTimestamp()); // 2026-04-05T...
// Custom _id
db.users.insertOne({ _id: "user_mayur", name: "Mayur" });
// Find by _id (most efficient query)
db.users.findOne({ _id: ObjectId("64a1b2c3d4e5f6789012345a") });
Why it matters: Shows understanding of MongoDB's distributed identity generation and primary key mechanics. Critical for pagination, sorting by insertion order, and cross-shard uniqueness.
Real applications: Extracting creation time from ObjectId for time-series analysis without a separate timestamp field, and using ObjectId for cursor-based pagination.
Common mistakes: Passing ObjectId as a plain string in queries — you must wrap it: ObjectId("..."). Also comparing ObjectIds with == instead of .equals() in application code.
find() is used when you need multiple results, while findOne() is optimized for single-document retrieval. Using findOne() is more efficient when you only need one document because MongoDB stops scanning after the first match. Both support projection as a second argument to include/exclude fields.
// find() — returns cursor, iterate all matches
const cursor = db.users.find({ city: "Mumbai" });
cursor.forEach(doc => console.log(doc));
// find() with projection (include only name, email)
db.users.find({ city: "Mumbai" }, { name: 1, email: 1, _id: 0 });
// findOne() — returns first match as object
const user = db.users.findOne({ email: "mayur@example.com" });
console.log(user.name); // Direct property access
// find() with limit, skip, sort
db.users.find({}).sort({ age: -1 }).skip(10).limit(5);
// Count matching documents
db.users.countDocuments({ city: "Mumbai" });
Why it matters: Tests understanding of cursor mechanics and performance implications. Using find() when you need only one result is wasteful — MongoDB keeps scanning beyond the first match.
Real applications: findOne() for auth lookups by email, find() for listing all orders for a user with pagination.
Common mistakes: Using find() expecting an array — it returns a cursor. Call .toArray() or iterate with .forEach(). Also not using projections, fetching entire documents when only 2 fields are needed.
$jsonSchema validator applied at the collection level. You define rules like required fields, data types, string patterns, and value ranges. Validation can be set to strict (reject invalid documents) or moderate (only validate existing documents). Schema validation was added in MongoDB 3.6 and enforced at the database level, providing a middle ground between full schema-less freedom and strict SQL-like enforcement. Mongoose ODM provides application-level schema validation for Node.js applications.
// Create collection with validation
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["name", "email", "age"],
properties: {
name: { bsonType: "string", minLength: 2 },
email: {
bsonType: "string",
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
},
age: { bsonType: "int", minimum: 18, maximum: 120 },
role: { enum: ["user", "admin", "moderator"] }
}
}
},
validationLevel: "strict", // "strict" | "moderate"
validationAction: "error" // "error" | "warn"
});
// Insert violating document — throws error
db.users.insertOne({ name: "A", email: "bad", age: 15 });
Why it matters: Shows you understand MongoDB's evolution toward data integrity guarantees. This bridges the gap between pure NoSQL freedom and data reliability needed in production.
Real applications: Enforcing email format and age range for user registration services, ensuring required fields in financial transaction records.
Common mistakes: Forgetting that validationLevel: "moderate" only validates new/modified documents, not the existing ones during schema migration.
{
"_id": ObjectId("..."), // ObjectId
"name": "Mayur", // String
"age": NumberInt(30), // Int32
"salary": NumberDecimal("75000.50"), // Decimal128
"rating": 4.8, // Double
"isActive": true, // Boolean
"joinedAt": ISODate("2026-01-01"), // Date
"tags": ["dev", "mongodb"], // Array
"address": { "city": "Mumbai" }, // Embedded Doc
"photo": BinData(0, "base64..."), // Binary
"pattern": /^mongo/i, // Regex
"deletedAt": null // Null
}
Why it matters: Type choice affects query correctness, index efficiency, and storage size. Storing numbers as strings breaks range queries and sorting.
Real applications: Use Decimal128 for financial amounts, Date for timestamps (not string), Array for tags or multi-value fields.
Common mistakes: Storing dates as strings breaks $gte/$lte date range queries. Using Double for currency causes floating-point precision errors — always use Decimal128 for money.
// Connecting to Atlas (Node.js / Mongoose)
const mongoose = require('mongoose');
const uri = process.env.MONGODB_URI;
// Atlas URI format:
// mongodb+srv://username:password@cluster.abc.mongodb.net/dbname
mongoose.connect(uri, {
useNewUrlParser: true,
useUnifiedTopology: true,
serverSelectionTimeoutMS: 5000
}).then(() => console.log('Connected to Atlas'))
.catch(err => console.error('Atlas connection error:', err));
// Atlas free tier: M0 — 512 MB storage, shared cluster
// Production: M10+ dedicated clusters
Why it matters: Most modern projects use Atlas for deployment. Understanding Atlas vs self-hosted shows production deployment awareness and knowledge of managed services trade-offs.
Real applications: Startups use Atlas M0 free tier for development, enterprises scale to M30+ dedicated clusters with automated backups, VPC peering, and PrivateLink.
Common mistakes: Storing Atlas connection strings in plain text in code instead of environment variables. Also forgetting to whitelist IP addresses in Atlas Network Access settings.
// Enable sharding on a database
sh.enableSharding("ecommerce");
// Shard a collection by a shard key
sh.shardCollection("ecommerce.orders", { userId: 1 });
// Hashed sharding — better distribution
sh.shardCollection("ecommerce.events", { _id: "hashed" });
// Architecture:
// Client App
// ↓
// mongos (router) — 1+
// ↓
// Config Servers (3 replica set nodes)
// ↓
// Shard 1 RS | Shard 2 RS | Shard 3 RS
// (P+2S+2A) | (P+2S+2A) | (P+2S+2A)
Why it matters: Sharding is a senior/advanced topic that tests your understanding of distributed database architecture. An examiner wants to know if you can design for scale from the start.
Real applications: MongoDB at eBay, Foursquare, LinkedIn uses sharding for billions of documents across hundreds of shards.
Common mistakes: Choosing a monotonically increasing shard key (like ObjectId) creates a "hotspot" — all writes go to one shard. Use hashed sharding or a high-cardinality key for even distribution.
// Replica set architecture (3-node)
// Primary: p1.example.com:27017 (handles writes)
// Secondary: p2.example.com:27017 (replicates data)
// Secondary: p3.example.com:27017 (replicates data)
// Connecting to a replica set
const uri = "mongodb://p1,p2,p3/?replicaSet=myRS&readPreference=primaryPreferred";
// Read preferences:
// primary — reads from primary only (default)
// primaryPreferred — primary if available, else secondary
// secondary — reads from secondaries only
// nearest — lowest latency member
// Check replica set status
rs.status()
rs.isMaster()
Why it matters: Replica sets are the foundation of MongoDB's fault tolerance. Every production MongoDB setup must use replica sets, and understanding elections and read preferences is key for system design interviews.
Real applications: Netflix uses replica sets to ensure zero-downtime failover. Read-heavy analytics dashboards use secondary reads to offload the primary.
Common mistakes: Running a standalone MongoDB in production (no replica set) means no failover. Also not understanding that secondary reads may return stale data due to replication lag.
// Check storage engine
db.serverStatus().storageEngine;
// { name: "wiredTiger", ... }
// WiredTiger collection options
db.createCollection("events", {
storageEngine: {
wiredTiger: {
configString: "block_compressor=zlib" // zlib for better compression
}
}
});
// WiredTiger cache size (50% of RAM by default)
// mongod --wiredTigerCacheSizeGB 4
// Cache hit ratio (should be > 95%)
db.serverStatus().wiredTiger.cache["pages read into cache"]
Why it matters: Shows deep operational knowledge. Interviewers for senior/DBA roles ask about storage engines to assess your ability to tune MongoDB at the infrastructure level.
Real applications: Tuning WiredTiger cache size is the first optimization step for high-traffic applications, ensuring the working set fits in RAM.
Common mistakes: Not monitoring cache eviction rates — if evictions are high, your working set doesn't fit in cache and performance degrades significantly.
// Enterprise-only: LDAP Authentication
security:
authorization: enabled
ldap:
servers: "ldap.company.com"
transportSecurity: tls
authz:
queryTemplate: "..."
// Enterprise-only: Field-Level Encryption
const clientEncryption = new ClientEncryption(mongoClient, {
keyVaultNamespace: "encryption.__keyVault",
kmsProviders: { aws: awsCredentials }
});
// Enterprise-only: Audit log
auditLog:
destination: file
filter: '{ atype: { $in: ["authenticate", "find"] } }'
Why it matters: Shows awareness of licensing, compliance, and security features that matter in enterprise environments. A common question in enterprise job interviews.
Real applications: Healthcare companies use Enterprise's field-level encryption for HIPAA compliance. Financial institutions use Enterprise auditing for SOC2 certification.
Common mistakes: Attempting to implement LDAP auth with Community Edition — those security features are strictly Enterprise-only.
// Create capped collection (1MB, max 1000 docs)
db.createCollection("app_logs", {
capped: true,
size: 1048576, // 1 MB (required, in bytes)
max: 1000 // optional max document count
});
// Insert works normally
db.app_logs.insertOne({
level: "ERROR",
message: "DB connection failed",
timestamp: new Date()
});
// Tailable cursor — streams new documents in real-time
const cursor = db.app_logs.find({}, { tailable: true, awaitData: true });
cursor.forEach(doc => console.log(doc)); // stays open, waits for new docs
// Check if collection is capped
db.app_logs.isCapped(); // true
Why it matters: Tests knowledge of specialized collection types. Capped collections are a go-to solution for logging use cases and show understanding of trade-offs between functionality and performance.
Real applications: Activity feeds keeping last 500 events, application error logs, audit trails with bounded storage, and real-time dashboards using tailable cursors.
Common mistakes: Trying to delete documents from a capped collection (not allowed except drop()). Also not pre-allocating enough size — once the limit is hit, old data is gone forever.
// Node.js GridFS with Mongoose (gridfs-stream)
const mongoose = require('mongoose');
const Grid = require('gridfs-stream');
const conn = mongoose.connection;
let gfs;
conn.once('open', () => {
gfs = Grid(conn.db, mongoose.mongo);
gfs.collection('uploads');
});
// Upload using GridFSBucket (modern API)
const { GridFSBucket } = require('mongodb');
const bucket = new GridFSBucket(db, { bucketName: 'images' });
// Upload a file
fs.createReadStream('./photo.jpg')
.pipe(bucket.openUploadStream('photo.jpg', {
metadata: { userId: '123', contentType: 'image/jpeg' }
}))
.on('finish', () => console.log('Upload complete'));
// Download a file
bucket.openDownloadStreamByName('photo.jpg')
.pipe(res); // pipe to HTTP response
Why it matters: Shows knowledge of MongoDB's file handling capabilities. Commonly asked when discussing file upload architectures and whether to store files in MongoDB vs S3/CDN.
Real applications: Storing user-uploaded profile images, PDF reports, and video recordings when a separate object storage service is not desired.
Common mistakes: Using GridFS for small files (< 1 MB) when BinData embedded in a document is simpler and faster. GridFS shines for large files that need streaming.
writeConcern: { j: true }, a write is acknowledged only after it has been persisted to the journal on disk — providing the highest level of single-node durability. WiredTiger uses checkpoints (60-second intervals by default) combined with the journal for full crash recovery.
// Write concern options
// j: true — wait until write is journaled (durable)
// j: false — acknowledge when in memory (faster, risky)
db.orders.insertOne(
{ item: "laptop", qty: 5 },
{ writeConcern: { w: 1, j: true, wtimeout: 5000 } }
);
// w: 1 — acknowledged by primary
// w: "majority" — acknowledged by majority of replica set
// j: true — persisted to journal on disk
// Check journaling status
db.serverStatus().dur
// Checkpoint interval (default 60s)
// mongod --syncdelay 60
Why it matters: Critical for understanding MongoDB's ACID guarantees and durability. Interviewers ask this to test your understanding of data safety in production systems.
Real applications: Financial transaction systems use w: "majority", j: true to guarantee writes are durable across replica set members before confirmation.
Common mistakes: Using j: false in production for performance without understanding the risk — a server crash before the journal flush can silently lose confirmed writes.