A MongoDB Vector Search query takes the form of an aggregation pipeline that uses $vectorSearch as the first
stage. This page explains the syntax, options, and behavior of the
$vectorSearch stage.
Supported Clients
Syntax
Fields
The $vectorSearch stage takes a document with the following fields:
Vector Search Types
When you define a $vectorSearch stage, you can use the
exact field to specify whether to run an ANN or ENN search.
For Approximate Nearest Neighbors (ANN) search, MongoDB Vector Search finds vector embeddings in your data that are closest to the vector embedding in your query based on their proximity in multi-dimensional space and based on the number of neighbors that it considers. It uses the Hierarchical Navigable Small Worlds algorithm and finds the vector embeddings most similar to the vector embedding in your query without scanning every vector. Therefore, ANN search is ideal for querying large datasets without significant filtering.
Note
Optimal recall for ANN search is typically considered to
be around 90-95% overlap in results with ENN search but
with significantly lower latency. This provides a good balance
between accuracy and performance. To achieve this with MongoDB Vector Search,
tune the numCandidates parameter
at query time.
numCandidates Selection
You must specify the numCandidates field to run ANN search.
This field determines how many nearest neighbors MongoDB Vector Search considers
during the search.
We recommend that you specify a numCandidates number at least 20
times higher than the number of documents to return (limit) to
increase accuracy and reduce discrepancies between your ENN and
ANN query results. For example, if you set limit to return
5 results, consider setting numCandidates to 100 as a
starting point. To learn more, see How to Measure the Accuracy of Your Query Results.
This overrequest pattern is the recommended way to trade off latency
and recall in your ANN searches. However, we recommend
tuning the numCandidates parameter based on your specific dataset
size and query requirements. To ensure that you get accurate results,
consider the following variables:
For an Exact Nearest Neighbors (ENN) search, MongoDB Vector Search exhaustively searches all the indexed vector embeddings by calculating the distance between all the embeddings and finds the exact nearest neighbor for the vector embedding in your query. This is computationally intensive and might negatively impact query latency. Therefore, we recommend ENN searches for the following use-cases:
Behavior
$vectorSearch must be the first stage of any pipeline where it
appears.
Limitations
$vectorSearch can't be used in
view definition and the following pipeline
stages:
| [1] | You can pass the results of $vectorSearch
to this stage. |
MongoDB Vector Search Indexing
To learn more about these MongoDB Vector Search field types, see How to Index Fields for Vector Search.
MongoDB Vector Search Scoring
MongoDB Vector Search assigns a score, in a fixed range from 0 to 1
(where 0 indicates low similarity and 1 indicates high
similarity), to every document that it returns.
Each returned document includes the score as metadata. To return each
document's score along with the result set, use a
$project stage in your aggregation pipeline and configure
the score as a field to project. In the score field, specify a
$meta expression
with the value vectorSearchScore. The syntax is as follows:
1 db.<collection>.aggregate([ 2 { 3 "$vectorSearch": { 4 <query-syntax> 5 } 6 }, 7 { 8 "$project": { 9 "<field-to-include>": 1, 10 "<field-to-exclude>": 0, 11 "score": { "$meta": "vectorSearchScore" } 12 } 13 } 14 ])
Note
You can use vectorSearchScore as a score $meta expression only after the
$vectorSearch pipeline stage. If you use
vectorSearchScore after any other query, MongoDB logs a warning
starting in MongoDB v8.2.
Note
Pre-filtering your data doesn't affect the score that MongoDB Vector Search returns
using vectorSearchScore for $vectorSearch queries.
MongoDB Vector Search Pre-Filtering
The $vectorSearch filter option matches BSON
boolean, date, objectId, numeric, string, and UUID values, including arrays of these types.
You must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison.
MongoDB Vector Search supports the $vectorSearch filter option for
the following MQL operators:
Type | MQL operator |
|---|---|
Equality | |
Range | |
In set | |
Existence | |
Logical |
Note
The $vectorSearch filter option doesn't support
other query operators,
aggregation pipeline operators, or MongoDB Search operators.
Filtering Considerations
MongoDB Vector Search supports the short form of
$eq. In the short form, you don't need to specify$eqin the query.For example, consider the following filter with
$eq:"filter": { "_id": { "$eq": ObjectId("5a9427648b0beebeb69537a5") } This is equivalent to the following filter, which uses the short form of
$eq:"filter": { "_id": ObjectId("5a9427648b0beebeb69537a5") } You can use the
$andMQL operator to specify an array of filters in a single query.For example, consider the following pre-filter for documents with a
genresfield equal toActionand ayearfield with the value1999,2000, or2001:"filter": { "$and": [ { "genres": "Action" }, { "year": { "$in": [ 1999, 2000, 2001 ] } } ] } For advanced filtering capabilities such as fuzzy search, phrase matching, location filtering, and other analyzed text, use the vectorSearch operator in a
$searchstage.
Examples
Prerequisites
Before you run these examples, perform the following actions:
Add the dataset to your cluster.
Create MongoDB Vector Search indexes for the collection. For instructions, see the Create a MongoDB Vector Search Index procedure and copy the configurations for the basic or filter examples in your desired language.
Note
If you use mongosh, pasting the queryVector from the sample code
into your terminal might take a while depending on your machine.