MongoDB

Text Search & Geospatial

15 Questions

MongoDB's $text operator performs full-text search on string fields that have a text index. The text index tokenizes strings, removes stop words, and applies stemming for the specified language. MongoDB creates an inverted index of root word stems, so searching for "running" also matches "run" and "runs". A collection can have only one text index, but it can cover multiple fields with different weights. The $meta: "textScore" projection adds relevance scores to results. Text search supports phrase matching with quotes and exclusion with minus prefix.
// Create text index on multiple fields
db.articles.createIndex({
  title: "text",
  content: "text",
  tags: "text"
}, {
  weights: { title: 10, tags: 5, content: 1 },
  default_language: "english"
})

// Basic search
db.articles.find({ $text: { $search: "mongodb performance" } })

// Phrase search
db.articles.find({ $text: { $search: '"replica set"' } })

// Exclude term (search mongodb but not sharding)
db.articles.find({ $text: { $search: "mongodb -sharding" } })

// Sort by relevance
db.articles.find(
  { $text: { $search: "mongodb index" } },
  { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

MongoDB's built-in text search has significant limitations compared to dedicated search engines. It does not support: fuzzy matching (typo tolerance), autocomplete/prefix search, faceted search, vector/semantic search, advanced relevance tuning, or complex multi-field boosting. Only one text index per collection is allowed. It cannot combine text search with range filters efficiently (requires text index to dominate the query plan). It has no real-time index updates — writes are slower with text indexes due to tokenization overhead. For production search features, use MongoDB Atlas Search (built on Apache Lucene) or integrate with Elasticsearch.
// Built-in $text limitations:
db.products.find({
  $text: { $search: "laptop" },
  price: { $lt: 1000 } // NOT efficient — text search dominates
})

// Atlas Search handles this properly with compound operator
{
  $search: {
    compound: {
      must: [{ text: { query: "laptop", path: "name" } }],
      filter: [{ range: { path: "price", lt: 1000 } }]
    }
  }
}

// No fuzzy matching in $text:
db.products.find({ $text: { $search: "labtop" } }) // no results for typo

// Atlas Search with fuzzy:
{ $search: { text: { query: "labtop", path: "name", fuzzy: { maxEdits: 1 } } } }

MongoDB Atlas Search is a fully managed full-text search built on Apache Lucene, integrated directly into Atlas clusters. Unlike the basic $text operator, Atlas Search supports: fuzzy search (typo tolerance), autocomplete, faceted search, vector search (semantic/AI), synonyms, highlighting, and advanced relevance scoring. Queries use the $search aggregation stage. Atlas Search indexes are separate from regular MongoDB indexes and are synchronized asynchronously. It powers features like Airbnb's listing search and eBay's product discovery at scale.
// Create an Atlas Search index (via Atlas UI or Atlas CLI)
// Index definition:
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [{ "type": "string", "analyzer": "lucene.standard" },
               { "type": "autocomplete" }],
      "description": { "type": "string" },
      "price": { "type": "number" }
    }
  }
}

// Atlas Search query
db.products.aggregate([
  {
    $search: {
      index: "default",
      compound: {
        should: [
          { text: { query: "wireless headphones", path: "name", score: { boost: { value: 3 } } } },
          { text: { query: "wireless headphones", path: "description" } }
        ]
      }
    }
  },
  { $limit: 10 },
  { $project: { name: 1, price: 1, score: { $meta: "searchScore" } } }
])

Atlas Search's autocomplete index type uses edge n-gram tokenization to build prefix tokens for efficient prefix-based search. When a user types "mon", autocomplete returns results matching "mongodb", "monitor", "money", etc. The index definition must explicitly set field type to autocomplete. The $search query uses the autocomplete operator. You can configure tokenOrder (sequential or any) and minimum/maximum gram sizes to balance index size against search quality. This is real-time search-as-you-type and is used in search bars, tag inputs, and filter dropdowns.
// Atlas Search index with autocomplete
{
  "mappings": {
    "fields": {
      "title": [
        { "type": "autocomplete", "analyzer": "lucene.standard",
          "tokenization": "edgeGram", "minGrams": 2, "maxGrams": 15 }
      ]
    }
  }
}

// Autocomplete query — called on every keystroke
db.movies.aggregate([
  {
    $search: {
      autocomplete: {
        query: req.query.q,  // "mon" from user
        path: "title",
        tokenOrder: "sequential"
      }
    }
  },
  { $limit: 10 },
  { $project: { title: 1, _id: 1 } }
])

MongoDB stores geospatial data as GeoJSON objects (Point, LineString, Polygon, MultiPolygon) or as legacy coordinate pairs [longitude, latitude]. Two index types support geospatial queries: 2dsphere (spherical Earth model, WGS84 coordinates, supports GeoJSON) and 2d (flat coordinate plane, legacy, for non-Earth coordinates like game maps). The 2dsphere index is the standard for location-based features. Geospatial queries enable: proximity search, within shape, intersecting shapes, and distance calculations. Store coordinates as [longitude, latitude] — note MongoDB uses longitude first, unlike most mapping libraries.
// GeoJSON point structure (longitude FIRST, then latitude)
{
  name: "Coffee Shop",
  location: {
    type: "Point",
    coordinates: [72.8777, 19.0760]  // [longitude, latitude]
  }
}

// Create 2dsphere index
db.places.createIndex({ location: "2dsphere" })

// 2d index for flat plane (legacy)
db.gameObjects.createIndex({ position: "2d" })

// Store polygon for delivery zone
{
  name: "Mumbai Central Zone",
  area: {
    type: "Polygon",
    coordinates: [[
      [72.82, 18.97], [72.88, 18.97],
      [72.88, 19.05], [72.82, 19.05], [72.82, 18.97]
    ]]
  }
}

The $near and $nearSphere operators find documents closest to a given point, returned in order of distance. $geoNear in the aggregation pipeline also adds distance fields. The $maxDistance and $minDistance options limit the search radius. For flat coordinates use $near with a 2d index; for spherical Earth coordinates use $near with a 2dsphere index. Distance in $near with GeoJSON is in meters. The aggregation $geoNear stage is more flexible — it can calculate distance, filter, and sort in one stage and works in sharded collections.
// Find restaurants within 2km of user location
db.restaurants.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [72.8777, 19.0760] },
      $maxDistance: 2000  // meters
    }
  }
})

// Aggregation with $geoNear (adds distance field, works in sharded clusters)
db.restaurants.aggregate([
  {
    $geoNear: {
      near: { type: "Point", coordinates: [72.8777, 19.0760] },
      distanceField: "dist.calculated",
      maxDistance: 5000,
      spherical: true,
      query: { category: "coffee" }  // combine with filter
    }
  },
  { $limit: 20 },
  { $project: { name: 1, "dist.calculated": 1 } }
])

The $geoWithin operator finds all documents whose location is completely within a specified shape. Shapes include $polygon (legacy flat coordinates), $box, $center (circle with flat coordinates), $centerSphere (circle with radians), and GeoJSON Polygon/MultiPolygon. Unlike $near, $geoWithin does not sort by distance and does not use an index by default (though a 2dsphere index improves performance). Use it for zone-based queries like "all delivery addresses in this service area" or "all stores within city boundaries". GeoJSON geometry is preferred for accuracy.
// Find all users within a predefined delivery zone (polygon)
const deliveryZone = {
  type: "Polygon",
  coordinates: [[
    [72.82, 18.97], [72.95, 18.97],
    [72.95, 19.10], [72.82, 19.10],
    [72.82, 18.97]  // close the polygon
  ]]
};

db.users.find({
  location: {
    $geoWithin: { $geometry: deliveryZone }
  }
})

// Simple circle using $centerSphere (radius in radians: km / 6378.1)
db.stores.find({
  location: {
    $geoWithin: {
      $centerSphere: [[72.8777, 19.0760], 10 / 6378.1] // 10km radius
    }
  }
})

$geoIntersects finds documents whose geometry intersects with a specified GeoJSON object. Unlike $geoWithin (completely inside), $geoIntersects returns documents that partially overlap, touch, or are inside the given shape. This is ideal for: finding delivery routes that pass through a city, finding service areas that overlap with a user's region, or finding geofences triggered by a moving location. It requires a 2dsphere index for best performance. Common use case: a logistics app checking if a truck route intersects any restricted zones.
// Find all delivery routes that pass through Mumbai city limits
const mumbaiCity = {
  type: "Polygon",
  coordinates: [[
    [72.77, 18.89], [72.99, 18.89],
    [72.99, 19.27], [72.77, 19.27],
    [72.77, 18.89]
  ]]
};

db.deliveryRoutes.find({
  path: { $geoIntersects: { $geometry: mumbaiCity } }
})

// Find geofences that cover a user's current location (point in polygon)
db.geofences.find({
  boundary: {
    $geoIntersects: {
      $geometry: { type: "Point", coordinates: [72.8777, 19.0760] }
    }
  }
})

A "stores near me" feature combines a 2dsphere index with $geoNear aggregation to return sorted, paginated results with distance. The key considerations are: store coordinates as GeoJSON Points, create a 2dsphere index, use $geoNear with spherical: true, and add distance to each result. On the frontend, get user coordinates via the Geolocation API and pass them to your Node.js/Express API which queries MongoDB. Cache results for commonly queried locations (city centers, malls) to reduce database load. Combine with $match to filter by category, rating, or business hours.
// Express route for "stores near me"
app.get('/api/stores/nearby', async (req, res) => {
  const { lng, lat, radius = 5000, category } = req.query;
  
  const matchFilter = {};
  if (category) matchFilter.category = category;
  
  const stores = await db.collection('stores').aggregate([
    {
      $geoNear: {
        near: { type: "Point", coordinates: [parseFloat(lng), parseFloat(lat)] },
        distanceField: "distance",
        maxDistance: parseInt(radius),
        spherical: true,
        query: matchFilter
      }
    },
    { $limit: 20 },
    {
      $project: {
        name: 1, address: 1, rating: 1,
        distanceKm: { $divide: ["$distance", 1000] }
      }
    }
  ]).toArray();
  
  res.json(stores);
});

MongoDB Atlas Vector Search enables semantic search using machine learning embeddings — finding documents that are conceptually similar rather than keyword-matched. Store document embeddings (arrays of floats from models like OpenAI's text-embedding-ada-002 or sentence transformers) in a field, create a vectorSearch index, and query using $vectorSearch aggregation stage with the k-Nearest Neighbors (kNN) algorithm. This powers AI-driven features like: semantic document search, recommendation systems, image similarity, and RAG (Retrieval Augmented Generation) for LLMs. Combine with $search (lexical) for hybrid search.
// Store document with embedding
await db.collection('articles').insertOne({
  title: "MongoDB Performance Tips",
  content: "...",
  embedding: [0.021, -0.134, 0.876, ...] // vector from ML model
});

// Atlas Vector Search index definition
{
  "fields": [{
    "type": "vector",
    "path": "embedding",
    "numDimensions": 1536,
    "similarity": "cosine"
  }]
}

// Semantic search query
db.articles.aggregate([
  {
    $vectorSearch: {
      index: "vector_index",
      path: "embedding",
      queryVector: await getEmbedding("how to speed up MongoDB?"),
      numCandidates: 100,
      limit: 5
    }
  },
  { $project: { title: 1, score: { $meta: "vectorSearchScore" } } }
])

Faceted search returns search results along with aggregated counts by category, price range, rating, etc. — like the filters on an e-commerce site. Atlas Search's $searchMeta stage (or $facet within $search) computes facet counts without returning documents, enabling sidebar filter counts. String facets count by distinct value; numeric facets bucket by ranges. This requires the field to have a facet type in the Atlas Search index definition. Faceted search is not possible with MongoDB's basic $text operator and is one of the primary reasons to use Atlas Search.
// Atlas Search index with facet fields
{
  "mappings": {
    "fields": {
      "category": [{"type": "stringFacet"}, {"type": "string"}],
      "price": [{"type": "numberFacet"}, {"type": "number"}],
      "name": [{"type": "string"}]
    }
  }
}

// Faceted search — get results AND facet counts
db.products.aggregate([
  {
    $searchMeta: {
      facet: {
        operator: { text: { query: "laptop", path: "name" } },
        facets: {
          categoryFacet: { type: "string", path: "category", numBuckets: 10 },
          priceRanges: {
            type: "number", path: "price",
            boundaries: [0, 500, 1000, 2000, 5000]
          }
        }
      }
    }
  }
])

Location tracking requires storing time-series coordinates efficiently. The bucket pattern is ideal — embed multiple location readings in one document per time period instead of one document per ping. Use GeoJSON LineString for route paths. MongoDB 5.0+ Time Series collections natively optimize time-stamped data with compression and time-based clustering. Geospatial indexes on location arrays let you query all pings within a zone. For real-time tracking, consider using MongoDB Change Streams to publish location updates to connected clients via WebSockets.
// Bucket pattern: store 100 location pings per document
{
  _id: ObjectId(),
  userId: "driver123",
  date: ISODate("2024-01-15"),
  bucketNum: 1,
  count: 100,
  pings: [
    { ts: ISODate("2024-01-15T09:00:00Z"), loc: { type: "Point", coordinates: [72.88, 19.07] } },
    { ts: ISODate("2024-01-15T09:01:00Z"), loc: { type: "Point", coordinates: [72.89, 19.08] } },
    // ... up to 100 pings
  ]
}

// Time series collection (MongoDB 5.0+)
db.createCollection("gpsPings", {
  timeseries: {
    timeField: "timestamp",
    metaField: "driverId",
    granularity: "minutes"
  }
})

$near is a query operator used in find() that sorts results by distance but cannot add a distance field or be used in sharded collections as a query predicate. $geoNear is an aggregation pipeline stage that adds a computed distance field, supports sharding, allows combining with additional filters in the same stage, and can be used anywhere in a pipeline. MongoDB recommends $geoNear for new applications. Both require a 2dsphere or 2d index. With $geoNear, it must be the first stage in the pipeline. The distanceMultiplier option in $geoNear converts meters to km (multiply by 0.001).
// $near — query operator in find()
db.stores.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [72.88, 19.07] },
      $maxDistance: 3000
    }
  }
}).limit(10)

// $geoNear — aggregation stage (preferred)
db.stores.aggregate([
  {
    $geoNear: {
      near: { type: "Point", coordinates: [72.88, 19.07] },
      distanceField: "distanceMeters",
      maxDistance: 3000,
      spherical: true,
      distanceMultiplier: 0.001  // convert to km
    }
  },
  { $match: { isOpen: true } },  // additional filters after
  { $limit: 10 },
  { $project: { name: 1, distanceKm: "$distanceMeters" } }
])

A ride-sharing driver matching system requires real-time location updates and geospatial proximity queries. Drivers continuously update their location document; a passenger request triggers a geospatial query to find available drivers nearby, sorted by distance. Use $geoNear to find and rank nearby available drivers. Combine with $match to filter by status (available), vehicle type, and rating. Index on { status: 1, location: "2dsphere" } for compound filtering. The partial index feature can index only available drivers ({ status: "available" }) for a smaller, faster index.
// Driver document
{ _id: "d1", name: "Ravi", status: "available", rating: 4.8,
  location: { type: "Point", coordinates: [72.88, 19.07] },
  updatedAt: new Date() }

// Index — partial index only on available drivers
db.drivers.createIndex(
  { location: "2dsphere" },
  { partialFilterExpression: { status: "available" } }
)

// Passenger request: find nearest 5 available drivers
const nearbyDrivers = await db.collection('drivers').aggregate([
  {
    $geoNear: {
      near: { type: "Point", coordinates: [passengerLng, passengerLat] },
      distanceField: "dist",
      maxDistance: 10000, // 10km
      query: { status: "available", rating: { $gte: 4.0 } },
      spherical: true
    }
  },
  { $limit: 5 }
]).toArray();

Combining text search and geospatial queries in a single pipeline requires either Atlas Search (cleanest approach) or a two-step approach with the basic driver. With the basic driver, you cannot use both a text index and a 2dsphere index in the same query efficiently — one dominates. The solution is to filter geographically first using $geoNear, then apply text matching on the reduced result set. Atlas Search's compound operator with geoWithin or near geo filters handles this natively in one stage. For production location-based search (e.g., "coffee shops near me"), Atlas Search is strongly recommended.
// Approach 1: $geoNear then text filter in aggregation
db.restaurants.aggregate([
  {
    $geoNear: {
      near: { type: "Point", coordinates: [72.88, 19.07] },
      distanceField: "dist",
      maxDistance: 5000,
      spherical: true
    }
  },
  { $match: { $text: { $search: "pizza" } } }, // text match on reduced set
  { $sort: { dist: 1 } },
  { $limit: 10 }
])

// Approach 2 (Atlas Search — recommended)
db.restaurants.aggregate([
  {
    $search: {
      compound: {
        must: [{ text: { query: "pizza", path: "cuisine" } }],
        filter: [{
          geoWithin: {
            path: "location",
            circle: {
              center: { type: "Point", coordinates: [72.88, 19.07] },
              radius: 5000
            }
          }
        }]
      }
    }
  }
])