How to Create Vector Embeddings

You can store vector embeddings alongside your other MongoDB data. These embeddings capture meaningful relationships in your data and allow you to perform semantic search and implement RAG.

Get Started

Use the following tutorial to learn how to create vector embeddings and query them using vector search. Specifically, you perform the following actions:

Define a function that uses an embedding model to generate vector embeddings.
Select whether you want to use a proprietary or open-source model. For state-of-the-art embeddings, use Voyage AI.
Create embeddings from your data and store them in MongoDB.
Select whether you want to create embeddings from new data or from existing data that you already have in a MongoDB collection.
Create embeddings from your search terms and run a vector search query.

For production applications, you typically write a script to generate vector embeddings. You can start with the sample code on this page and customize it for your use case.

Embedding Model

Data Source

Language

Prerequisites

To complete this tutorial, you must have the following:

Use an Embedding Model

Create Embeddings from Data

In this section, you create vector embeddings from your data using the function that you defined, and then you store these embeddings in a MongoDB collection.

Create Embeddings for Queries

In this section, you index the vector embeddings in your collection and create an embedding that you use to run a sample vector search query.

The vector search returns documents whose embeddings are closest in distance to the embedding from your query. This indicates that they are similar in meaning.

Considerations

Consider the following factors when creating vector embeddings:

Choosing a Method to Create Embeddings

In order to create vector embeddings, you must use an embedding model. Embedding models are algorithms that you use to generate numerical representations of your data. Choose one of the following ways to access an embedding model:

Method	Description
Load an open-source model	If you don't have an API key for a proprietary embedding model, load an open-source embedding model locally from your application.
Use a proprietary model	Most AI providers offer APIs for their proprietary embedding models that you can use to create vector embeddings. For state-of-the-art embeddings, use Voyage AI.
Leverage an integration	You can integrate MongoDB Vector Search with open-source frameworks and AI services to quickly connect to both open-source and proprietary embedding models and generate vector embeddings for MongoDB Vector Search. To learn more, see Integrate MongoDB with AI Technologies.

Choosing an Embedding Model

The embedding model you choose affects your query results and determines the number of dimensions you specify in your MongoDB Vector Search index. Each model offers different advantages depending on your data and use case. For state-of-the-art embeddings, including multi-modal and domain-specific embedding models, use Voyage AI.

When choosing an embedding model for MongoDB Vector Search, consider the following metrics:

Embedding Dimensions: The length of the vector embedding.
Smaller embeddings are more storage efficient, while larger embeddings can capture more nuanced relationships in your data. The model you choose should strike a balance between efficiency and complexity.
Max Tokens: The number of tokens that can be compressed in a single embedding.
Model Size: The size of the model in gigabytes.
While larger models perform better, they require more computational resources as you scale MongoDB Vector Search to production.
Retrieval Average: A score that measures the performance of retrieval systems.
A higher score indicates that the model is better at ranking relevant documents higher in the list of retrieved results. This score is important when choosing a model for RAG applications.

Vector Compression

If you have a large number of float vectors and want to reduce the storage and WiredTiger footprint (such as disk and memory usage) in mongod, compress your embeddings by converting them to binData vectors.

BinData is a BSON data type that stores binary data. The default type for vector embeddings is an array of 32-bit floats (float32). Binary data is more storage efficient than the default array format, and therefore requires three times less disk space.

Storing binData vectors improves query performance since less resources are needed to load a document into the working set. This can significantly improve query speed for vector queries where you are returning over 20 documents. If you compress your float32 embeddings, you can query them with either float32 or binData vectors.

The tutorial on this page includes an example function that you can use to convert your float32 vectors to binData vectors.

Supported Drivers

BSON BinData vectors are supported by the following drivers:

C++ Driver v4.1.0 or later
C#/.NET Driver v3.2.0 or later
Go Driver v2.1.0 or later
PyMongo Driver v4.10 or later
Node.js Driver v6.11 or later
Java Driver v5.3.1 or later

Background

Float vectors are typically difficult to compress because each element in the array has its own type (despite most vectors being uniformly typed). For this reason, converting the float vector output of an embedding model to a binData vector with subtype float32 is a more efficient serialization scheme. binData vectors store a single type descriptor for the entire vector, which reduces storage overhead.

Validating Your Embeddings

Consider the following strategies to ensure that your embeddings are correct and optimal:

Best Practices

Learn best practices when creating embeddings.

Consider the following best practices when generating and querying your embeddings:

Test your functions and scripts.
Generating embeddings takes time and computational resources. Before you create embeddings from large datasets or collections, test that your embedding functions or scripts work as expected on a small subset of your data.
Create embeddings in batches.
If you want to generate embeddings from a large dataset or a collection with many documents, create them in batches to avoid memory issues and optimize performance.
Evaluate performance.
Run test queries to check if your search results are relevant and accurately ranked.
To learn more about how to evaluate your results and fine-tune the performance of your indexes and queries, see How to Measure the Accuracy of Your Query Results and Benchmark for MongoDB Vector Search.

Troubleshooting

Learn strategies to troubleshoot issues with your embeddings.

Consider the following strategies if you encounter issues with your embeddings:

Verify your environment.
Check that the necessary dependencies are installed and up-to-date. Conflicting library versions can cause unexpected behavior. Ensure that no conflicts exist by creating a new environment and installing only the required packages.
Note
If you're using Colab, ensure that your notebook session's IP address is included in your Atlas project's access list.
Monitor memory usage.
If you experience performance issues, check your RAM, CPU, and disk usage to identify any potential bottlenecks. For hosted environments like Colab or Jupyter Notebooks, ensure that your instance is provisioned with sufficient resources and upgrade the instance if necessary.
Ensure consistent dimensions.
Verify that the MongoDB Vector Search index definition matches the dimensions of the embeddings stored in MongoDB and your query embeddings match the dimensions of the indexed embeddings. Otherwise, you might encounter errors when running vector search queries.

To troubleshoot specific problems, see Troubleshooting.

Next Steps

Once you've learned how to create embeddings and query your embeddings with MongoDB Vector Search, start building generative AI applications by implementing retrieval-augmented generation (RAG):

You can also quantize your 32-bit float vector embeddings into fewer bits to further reduce resource consumption and improve query speed. To learn more, see Vector Quantization.

Back

Compatibility & Limitations

Index Reference