This quick start describes how to load sample documents that contain vector embeddings into an Atlas cluster or local Atlas deployment, create an Atlas Vector Search index on those embeddings, and then perform semantic search to return documents that are similar to your query.
Time required: 15 minutes
Objectives
In this quick start, you complete the following steps:
Create an index definition for the
sample_mflix.embedded_moviescollection that indexes theplot_embedding_voyage_3_largefield as thevectortype. Theplot_embedding_voyage_3_largefield contains embeddings created using Voyage AI'svoyage-3-largeembedding model. The index definition specifies2048vector dimensions and measures similarity usingdotProduct.Run an Atlas Vector Search query that searches the sample
sample_mflix.embedded_moviescollection. The query uses the$vectorSearchstage to search theplot_embedding_voyage_3_largefield, which contains embeddings created using Voyage AI'svoyage-3-largeembedding model. The query searches theplot_embedding_voyage_3_largefield using vector embeddings for the string time travel. It considers up to150nearest neighbors, and returns10documents in the results.
To learn more, see Learning Summary.
Learning Summary
This quick start focused on retrieving documents from your Atlas cluster that contain text that is semantically related to a provided query. However, you can create a vector search index on embeddings that represent any type of data that you might write to an Atlas cluster, such as images or videos.
Sample Data
This quick start uses the sample_mflix.embedded_movies collection which
contains details about movies. In each document in the collection, the
plot_embedding_voyage_3_large field contains a vector embedding that represents the string
in the plot field. For more information on the schema of the documents in
the collection, see Sample Mflix Dataset.
By storing your source data and its corresponding vector embeddings in the same document, you can leverage both fields for complex queries or hybrid search. You can even store vector embeddings generated from different embedding models in the same document to streamline your workflow as you test the performance of different vector embedding models for your specific use case.
Vector Embeddings
The vector embeddings in the sample_mflix.embedded_movies collection
and in the example query were created using the Voyage AI voyage-3-large
embedding model. Your choice of embedding model informs the vector dimensions
and vector similarity function you use in your vector search index. You can use
any embedding model you like, and it is worth
experimenting with different models as accuracy can vary from model to model
depending on your specific use case.
To learn how to create vector embeddings of your own data, see How to Create Vector Embeddings.
Vector Index Definition
An index is a data structure that holds a subset of data from a collection's documents that improves database performance for specific queries. A vector search index points to the fields that contain your vector embeddings and includes the dimensions of your vectors as well as the function used to measure similarity between vectors of queries and vectors stored in the database.
Because the voyage-3-large embedding model used in this quick start
converts data into vector embeddings with 2048 dimensions and supports the
cosine function, this vector search index specifies the same number of
vector dimensions and similarity function.
Vector Search Query
The query you ran in this quick start is an aggregation pipeline,
in which the $vectorSearch stage performs an Approximate Nearest Neighbor (ANN)
search followed by a $project stage that refines the results.
To see all the options for a vector search query, including using Exact Nearest Neighbor (ENN) or how
to narrow the scope of your vector search with the filter option,
see Run Vector Search Queries.
Next Steps
To learn how to create embeddings from data and load them into Atlas, see Create Embeddings.
To learn how to implement retrieval-augmented generation (RAG), see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.
To integrate Atlas Vector Search with popular AI frameworks and services, see Integrate MongoDB Atlas with AI Technologies.
To build production ready AI chatbots using Atlas Vector Search, see the MongoDB Chatbot Framework.
To learn how to implement RAG without the need for API keys or credits, see Build a Local RAG Implementation with Atlas Vector Search.