Docs Menu
Docs Home
/
Atlas
/

Atlas Vector Search Quick Start

This quick start describes how to load sample documents that contain vector embeddings into an Atlas cluster or local Atlas deployment, create an Atlas Vector Search index on those embeddings, and then perform semantic search to return documents that are similar to your query.

Time required: 15 minutes

In this quick start, you complete the following steps:

  1. Create an index definition for the sample_mflix.embedded_movies collection that indexes the plot_embedding field as the vector type. The plot_embedding field contains embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using dotProduct.

  2. Run an Atlas Vector Search query that searches the sample sample_mflix.embedded_movies collection. The query uses the $vectorSearch stage to search the plot_embedding field, which contains embeddings created using OpenAI's text-embedding-ada-002 embedding model. The query searches the plot_embedding field using vector embeddings for the string time travel. It considers up to 150 nearest neighbors, and returns 10 documents in the results.

To learn more, see Learning Summary.

This quick start focused on retrieving documents from your Atlas cluster that contain text that is semantically related to a provided query. However, you can create a vector search index on embeddings that represent any type of data that you might write to an Atlas cluster, such as images or videos.

This quick start uses the sample_mflix.embedded_movies collection which contains details about movies. In each document in the collection, the plot_embedding field contains a vector embedding that represents the string in the plot field. For more information on the schema of the documents in the collection, see Sample Mflix Dataset.

By storing your source data and its corresponding vector embeddings in the same document, you can leverage both fields for complex queries or hybrid search. You can even store vector embeddings generated from different embedding models in the same document to streamline your workflow as you test the performance of different vector embedding models for your specific use case.

The vector embeddings in the sample_mflix.embedded_movies collection and in the example query were created using the OpenAI text-embedding-ada-002 embedding model. Your choice of embedding model informs the vector dimensions and vector similarity function you use in your vector search index. You can use any embedding model you like, and it is worth experimenting with different models as accuracy can vary from model to model depending on your specific use case.

To learn how to create vector embeddings of your own data, see How to Create Vector Embeddings.

An index is a data structure that holds a subset of data from a collection's documents that improves database performance for specific queries. A vector search index points to the fields that contain your vector embeddings and includes the dimensions of your vectors as well as the function used to measure similarity between vectors of queries and vectors stored in the database.

Because the text-embedding-ada-002 embedding model used in this quick start converts data into vector embeddings with 1536 dimensions and supports the cosine function, this vector search index specifies the same number of vector dimensions and similarity function.

The query you ran in this quick start is an aggregation pipeline, in which the $vectorSearch stage performs an Approximate Nearest Neighbor (ANN) search followed by a $project stage that refines the results. To see all the options for a vector search query, including using Exact Nearest Neighbor (ENN) or how to narrow the scope of your vector search with the filter option, see Run Vector Search Queries.

Back

Atlas Vector Search

Earn a Skill Badge

Master "Vector Search Fundamentals" for free!

Learn more

On this page