Generate Embeddings Automatically Using MongoDB Vector Search

Important

The MongoDB Vector Search indexes of text type are deprecated, as we prepare to transition from Private Preview to Public Preview. To learn more, see Preview Features.

You can configure MongoDB Vector Search to automatically generate and manage vector embeddings for the text data in your cluster. You can create a one-click AI semantic search index in your M10 or higher Atlas cluster and use Voyage AI embedding models, simplifying indexing, updating, and querying with vectors.

When you enable Automated Embedding, MongoDB Vector Search automatically generates embeddings using the specified embedding model at index-time for the specified text field in your Atlas collection and at query-time for your text string in your query against the field indexed for automated embeddings.

Interface

Considerations

Important

You can use MongoDB Vector Search Automated Embedding on any M10 or higher cluster on any cloud provider. However, the service that handles the inference process for generating vector embeddings runs on Google Cloud. This means that your data is sent to Google Cloud for embedding generation and retrieval, regardless of your cluster's cloud provider. We provide enterprise grade security and therefore, your data is only stored in your cluster.

The embedding models run on a shared, multi-tenant inference platform. Therefore, during the preview period, you must use datasets with less than 100k document and run queries only for the evaluation of the feature and not for load testing. Contact your account team if you have a use case with higher limits.

Although there are no hard rate-limits for your workload, there are global limits. If your queries return a rate limit error (error 409), perform a backoff and retry in your application code. This allows your application to gracefully handle rate limits and ensures continued functionality.

Prerequisites

To enable vector search using Automated Embedding, you must have the following:

M10 or higher cluster
A collection with a text field that you want to index for automated embeddings.
One of the following clients:
- Atlas UI for creating indexes
- mongosh for creating indexes and running queries
- Node Driver 6.6.0 or higher for creating indexes and running queries
- Python Driver 4.7 or higher for creating indexes and running queries

MongoDB Vector Search Index for Automated Embedding

The following sections describe the MongoDB Vector Search index syntax and fields for enabling automatic generation of embeddings for text fields and walk you through the steps for configuring your index for Automated Embedding.

Required Access

You need the Project Data Access Admin or higher role to create and manage MongoDB Vector Search indexes.

Index Syntax

The following is the syntax for enabling automatic generation of embeddings:

1 {
2   "fields": [
3     {
4       "type": "text",
5       "path": "<field-name>",
6       "model": "voyage-3-large | voyage-3.5 | voyage-3.5-lite"
7     }
8   ]
9 }

Index Fields

The following fields are required in the index definition:

Field	Type	Description
`type`	string	The type of the field. For Automated Embedding, this must be `text`.
`path`	string	The name of the field in the collection that you want to index for Automated Embedding.
`model`	string	The Voyage AI embedding model to use for generating the embeddings for the index. You can specify one of the following models: `voyage-3-large` - Highest-quality retrieval across languages and domains. `voyage-3.5` - Balanced model for multilingual use and general-purpose retrieval accuracy. `voyage-3.5-lite` - Lightweight, faster model optimized for latency and lower cost. After creating the index, if you change the embedding model subsequently, MongoDB Vector Search generates new embeddings for the dataset. While MongoDB Vector Search generates the embeddings, you can continue to query by using the old embeddings. When the old embeddings are replaced with embeddings from the new embedding model, MongoDB Vector Search removes the old embeddings.

Considerations

The index fields for Automated Embedding are mutually exclusive with the following vector type index fields:

numDimensions
similarity
quantization

If your collection already has embeddings, you must use the vector type fields to index the embeddings. To learn more about indexing fields with embeddings, see How to Index Fields for Vector Search.

You can create an index with both the text and vector types if you want to index a text field for automatically generating embeddings and also index a field with your own embeddings. MongoDB Vector Search will automatically generate embeddings for queries against only the field indexed as the text type. You must specify embeddings in the query for searching the field indexed as the vector type.

You can also index fields to pre-filter your data by using the MongoDB Vector Search filter type.

Important

Filtered queries are typically slower than an otherwise equivalent unfiltered query.

To learn more about pre-filtering your data, see About the filter Type.

Create an Index for Automated Embedding

The following procedure walks through the steps for enabling automated embeddings in your MongoDB Vector Search index. If you loaded the sample_mflix.movies dataset, the example in the procedure demonstrates how to enable Automated Embedding for the fullplot field in the collection.

MongoDB Vector Search Query with Automated Embedding

After you create an index with Automated Embedding, you can run text queries against the indexed field. MongoDB Vector Search automatically generates embeddings for the text string in your query using the same embedding model specified in the index. It uses the embeddings to search the index for documents that are semantically similar to the specified query text.

The following sections describe the $vectorSearch pipeline syntax and fields for automatically generating embeddings for your query text against the field indexed for Automated Embedding and demonstrate how to run semantic search queries against the fields indexed for Automated Embedding.

Query Syntax

The following syntax demonstrates how to run a query against a field indexed for Automated Embedding:

1 {
2   "$vectorSearch": {
3     "index": "<index-name>",
4     "limit": <number-of-results>,
5     "numCandidates": <number-of-candidates>,
6     "path": "<field-to-search>",
7     "query": "<query-string>"
8   }
9 }

Query Fields

The following fields are required for a MongoDB Vector Search query using automated embeddings:

Field	Type	Necessity	Description
`exact`	boolean	Conditional	This field is required if `numCandidates` is omitted. Mutually exclusive with `numCandidates`. Flag that specifies whether to run ENN or ANN search. Value can be one of the following: `false` - to run ANN search `true` - to run ENN search If omitted, defaults to `false`.
`index`	string	Required	Name of the MongoDB Vector Search index to use. MongoDB Vector Search doesn't return results if you misspell the index name or if the specified index doesn't already exist on the cluster.
`limit`	number	Required	Number (of type `int` only) of documents to return in the results. This value can't exceed the value of `numCandidates` if you specify `numCandidates`.
`numCandidates`	number	Conditional	This field is required if `exact` is `false` or omitted. Mutually exclusive with `exact`. Number of nearest neighbors to use during the search. Value must be less than or equal to (`<=`) `10000`. You can't specify a number less than the number of documents to return (`limit`).
`path`	string	Required	Indexed vector type field to search.
`query`	string	Required	Text for which to automatically generate embeddings and perform the semantic search.

Considerations

You can run an ANN or ENN query against the indexed field. To learn more, see ANN Search and ENN Search.

You can't specify vector embeddings in your query against fields indexed for Automated Embedding. Instead, you must run a natural language query against the field. When you run a natural language query against the field indexed for Automated Embedding, MongoDB Vector Search automatically generates the embeddings for the query text using the same embedding model as the indexed field. It then uses the generated embeddings to perform a semantic search against the indexed field.

You can optionally specify filter fields in your query to pre-filter the documents against which MongoDB Vector Search performs the semantic search. To learn more, see MongoDB Vector Search Pre-Filtering.

You can also optionally retrieve the score of the documents in the results. To learn more, see MongoDB Vector Search Scoring.

1	{
2	"fields": [
3	{
4	"type": "text",
5	"path": "<field-name>",
6	"model": "voyage-3-large \| voyage-3.5 \| voyage-3.5-lite"
7	}
8	]
9	}

1	{
2	"$vectorSearch": {
3	"index": "<index-name>",
4	"limit": <number-of-results>,
5	"numCandidates": <number-of-candidates>,
6	"path": "<field-to-search>",
7	"query": "<query-string>"
8	}
9	}

Generate Embeddings Automatically Using MongoDB Vector Search

Important

Considerations

Important

Prerequisites

MongoDB Vector Search Index for Automated Embedding

Required Access

Index Syntax

Index Fields

Considerations

Important

Create an Index for Automated Embedding

MongoDB Vector Search Query with Automated Embedding

Query Syntax

Query Fields

Considerations

Run a Query Using Automated Embedding