Docs Menu
Docs Home
/
Atlas
/ /

Get Started with the LlamaIndex Integration

You can integrate Atlas Vector Search with LlamaIndex to implement retrieval-augmented generation (RAG) in your LLM application. This tutorial demonstrates how to start using Atlas Vector Search with LlamaIndex to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:

  1. Set up the environment.

  2. Store custom data on Atlas.

  3. Create an Atlas Vector Search index on your data.

  4. Run the following vector search queries:

    • Semantic search.

    • Semantic search with metadata pre-filtering.

  5. Implement RAG by using Atlas Vector Search to answer questions on your data.

Work with a runnable version of this tutorial as a Python notebook.

LlamaIndex is an open-source framework designed to simplify how you connect custom data sources to LLMs. It provides several tools such as data connectors, indexes, and query engines to help you load and prepare vector embeddings for RAG applications.

By integrating Atlas Vector Search with LlamaIndex, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.

To complete this tutorial, you must have the following:

  • An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.

  • An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.

  • An environment to run interactive Python notebooks such as Colab.

Set up the environment for this tutorial. Create an interactive Python notebook by saving a file with the .ipynb extension. This notebook allows you to run Python code snippets individually, and you'll use it to run the code in this tutorial.

To set up your notebook environment:

1

Run the following command:

pip install --quiet --upgrade llama-index llama-index-vector-stores-mongodb llama-index-embeddings-openai pymongo

Then, run the following code to import the required packages:

import os, pymongo, pprint
from pymongo.operations import SearchIndexModel
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.core.settings import Settings
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, ExactMatchFilter, FilterOperator
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
2

Run the following code, replacing the placeholders with the following values:

os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
3

Run the following code to configure settings that are specific to LlamaIndex. These settings specify the following:

  • OpenAI as the LLM used by your application to answer questions on your data.

  • text-embedding-ada-002 as the embedding model used by your application to generate vector embeddings from your data.

  • Chunk size and overlap to customize how LlamaIndex partitions your data for storage.

Settings.llm = OpenAI()
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.chunk_size = 100
Settings.chunk_overlap = 10

Then, load custom data into Atlas and instantiate Atlas as a vector database, also called a vector store. Copy and paste the following code snippets into your notebook.

1

For this tutorial, you use a publicly accessible PDF document that contains that contains a recent MongoDB earnings report as the data source for your vector store. This document describes MongoDB's financial results for the fourth quarter and full year of fiscal 2025.

To load the sample data, run the following code snippet. It does the following:

  • Creates a new directory called data.

  • Retrieves the PDF from the specified URL and saves it as a file in the directory.

  • Uses the SimpleDirectoryReader data connector to extract raw text and metadata from the file. It also formats the data into documents.

# Load the sample data
!mkdir -p 'data/'
!wget 'https://investors.mongodb.com/node/13176/pdf' -O 'data/mongodb-earnings-report.pdf'
sample_data = SimpleDirectoryReader(input_files=["./data/mongodb-earnings-report.pdf"]).load_data()
# Print the first document
sample_data[0]
Document(id_='62b7cace-30c0-4687-9d87-e178547ae357', embedding=None,
metadata={'page_label': '1', 'file_name': 'mongodb-earnings-report.pdf',
'file_path': 'data/mongodb-earnings-report.pdf', 'file_type':
'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28',
'last_modified_date': '2025-05-28'},
excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size',
'creation_date', 'last_modified_date', 'last_accessed_date'],
excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size',
'creation_date', 'last_modified_date', 'last_accessed_date'],
relationships={}, metadata_template='{key}: {value}', metadata_separator='\n',
text_resource=MediaResource(embeddings=None, data=None, text='MongoDB, Inc.
Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results\nMarch 5,
2025\nFourth Quarter Fiscal 2025 Total Revenue of $548.4 million, up 20%
Year-over-Year\nFull Year Fiscal 2025 Total Revenue of $2.01 billion, up 19%
Year-over-Year\nContinued Strong Customer Growth with Over 54,500 Customers as
of January 31, 2025\nMongoDB Atlas Revenue up 24% Year-over-Year; 71% of Total
Q4 Revenue\nNEW YORK , March 5, 2025 /PRNewswire/ -- MongoDB, Inc. (NASDAQ:
MDB) today announced its financial results for the fourth quarter and
fiscal\nyear ended January 31, 2025.\n\xa0\n \xa0\n"MongoDB delivered a
strong end to fiscal 2025 with 24% Atlas revenue growth and significant margin
expansion. Atlas consumption in the quarter\nwas better than expected and we
continue to see good performance in new workload wins due to the flexibility,
scalability and performance of the\nMongoDB platform. In fiscal year 2026 we
expect to see stable consumption growth in Atlas, our main growth driver,"
said Dev Ittycheria, President\nand Chief Executive Officer of MongoDB
.\n"Looking ahead, we remain incredibly excited about our long-term growth
opportunity. MongoDB removes the constraints of legacy databases,\nenabling
businesses to innovate at AI speed with our flexible document model and
seamless scalability. Following the Voyage AI acquisition, we\ncombine
real-time data, sophisticated embedding and retrieval models and semantic
search directly in the database, simplifying the development of\ntrustworthy
AI-powered apps."\nFourth Quarter Fiscal 2025 Financial Highlights\nRevenue:
Total revenue was $548.4 million for the fourth quarter of fiscal 2025, an
increase of 20% year-over-year.\nSubscription revenue was $531.0 million, an
increase of 19% year-over-year, and services revenue was $17.4 million,
an\nincrease of 34% year-over-year.\nGross Profit: Gross profit was $399.4
million for the fourth quarter of fiscal 2025, representing a 73% gross
margin\ncompared to 75% in the year-ago period. Non-GAAP gross profit was
$411.7 million, representing a 75% non-GAAP gross\nmargin, compared to a
non-GAAP gross margin of 77% in the year-ago period.\nLoss from Operations:
Loss from operations was $18.6 million for the fourth quarter of fiscal 2025,
compared to a loss\nfrom operations of $71.0 million in the year-ago period.
Non-GAAP income from operations was $112.5 million, compared\nto non-GAAP
income from operations of $69.2 million in the year-ago period.\nNet Income
(Loss): Net income was $15.8 million, or $0.20 per share, based on 77.6
million weighted-average shares\noutstanding, for the fourth quarter of fiscal
2025. This compares to a net loss of $55.5 million, or $0.77 per share, in
the\nyear-ago period. Non-GAAP net income was $108.4 million, or $1.28 per
share, based on 84.6 million fully diluted\nweighted-average shares
outstanding. This compares to a non-GAAP net income of $71.1 million, or $0.86
per share, in\nthe year-ago period.\nCash Flow: As of January 31, 2025,
MongoDB had $2.3 billion in cash, cash equivalents, short-term investments
and\nrestricted cash. During the three months ended January 31, 2025, MongoDB
generated $50.5 million of cash from\noperations, compared to $54.6 million of
cash from operations in the year-ago period. MongoDB used $26.0 million of
cash\nin capital expenditures and used $1.6 million of cash in principal
payments of finance leases, leading to free cash flow of\n$22.9 million,
compared to free cash flow of $50.5 million in the year-ago period.\nFull Year
Fiscal 2025 Financial Highlights\nRevenue: Total revenue was $2.01 billion for
the full year fiscal 2025, an increase of 19% year-over-year.
Subscription\nrevenue was $1.94 billion, an increase of 19% year-over-year,
and services revenue was $62.6 million, an increase of
12%\nyear-over-year.\nGross Profit: Gross profit was $1.47 billion for the
full year fiscal 2025, representing a 73% gross margin compared to',
path=None, url=None, mimetype=None), image_resource=None, audio_resource=None,
video_resource=None, text_template='{metadata_str}\n\n{content}')
2

Run the following code to create a vector store named atlas_vector_store by using the MongoDBAtlasVectorSearch method, which specifies the following:

  • A connection to your Atlas cluster.

  • llamaindex_db.test as the Atlas database and collection used to store the documents.

  • vector_index as the index to use for querying the vector store.

Then, you save the vector store to a storage context, which is a LlamaIndex container object used to prepare your data for storage.

# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient(ATLAS_CONNECTION_STRING)
# Instantiate the vector store
atlas_vector_store = MongoDBAtlasVectorSearch(
mongo_client,
db_name = "llamaindex_db",
collection_name = "test",
vector_index_name = "vector_index"
)
vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_store)
3

Once you've loaded your data and instantiated Atlas as a vector store, generate vector embeddings from your data and store them in Atlas. To do this, you must build a vector store index. This type of index is a LlamaIndex data structure that splits, embeds, and then stores your data in the vector store.

The following code uses the VectorStoreIndex.from_documents method to build the vector store index on your sample data. It turns your sample data into vector embeddings and stores these embeddings as documents in the llamaindex_db.test collection in your Atlas cluster, as specified by the vector store's storage context.

Note

This method uses the embedding model and chunk settings that you configured when you set up your environment.

vector_store_index = VectorStoreIndex.from_documents(
sample_data, storage_context=vector_store_context, show_progress=True
)

Tip

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the langchain_db.test collection in your cluster.

Note

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

To enable vector search queries on your vector store, create an Atlas Vector Search index on the llamaindex_db.test collection.

In your notebook, run the following code to create an index of the vectorSearch type that indexes the following fields:

  • embedding field as the vector type. The embedding field contains the embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using cosine.

  • metadata.page_label field as the filter type for pre-filtering data by the page number in the PDF.

# Specify the collection for which to create the index
collection = mongo_client["llamaindex_db"]["test"]
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
definition={
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "metadata.page_label"
}
]
},
name="vector_index",
type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.

Once Atlas builds your index, return to your notebook and run vector search queries on your data. The following examples demonstrate different queries that you can run on your vectorized data.

This example performs a basic semantic search for the string MongoDB Atlas security and returns a list of documents ranked by relevance score. It also specifies the following:

  • Atlas Vector Search as a retriever to perform semantic search.

  • The similarity_top_k parameter to return only the three most relevant documents.

retriever = vector_store_index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve("MongoDB acquisition")
for node in nodes:
print(node)
Node ID: 479446ef-8a32-410d-a5e0-8650bd10d78d
Text: MongoDB completed the redemption of 2026 Convertible Notes,
eliminating all debt from the balance sheet. Additionally, in
conjunction with the acquisition of Voyage, MongoDB is announcing a
stock buyback program of $200 million, to offset the dilutive impact
of the acquisition consideration.
Score: 0.914
Node ID: 453137d9-8902-4fae-8d81-5f5d9b0836eb
Text: "Looking ahead, we remain incredibly excited about our long-term
growth opportunity. MongoDB removes the constraints of legacy
databases, enabling businesses to innovate at AI speed with our
flexible document model and seamless scalability. Following the Voyage
AI acquisition, we combine real-time data, sophisticated embedding and
retrieval mod...
Score: 0.914
Node ID: f3c35db6-43e5-4da7-a297-d9b009b9d300
Text: Lombard Odier, a Swiss private bank, partnered with MongoDB to
migrate and modernize its legacy banking technology systems on MongoDB
with generative AI. The initiative enabled the bank to migrate code
50-60 times quicker and move applications from a legacy relational
database to MongoDB 20 times faster than previous migrations.
Score: 0.912

You can pre-filter your data by using an MQL match expression that compares the indexed field with another value in your collection. You must index any metadata fields that you want to filter by as the filter type. To learn more, see How to Index Fields for Vector Search.

Note

You specified the metadata.page_label field as a filter when you created the index for this tutorial.

This example performs a semantic search for the string MongoDB Atlas security and returns a list of documents ranked by relevance score. It also specifies the following:

  • Atlas Vector Search as a retriever to perform semantic search.

  • The similarity_top_k parameter to return only the three most relevant documents.

  • A filter on the metadata.page_label field so that Atlas Vector Search searches for documents appearing on page two only.

# Specify metadata filters
metadata_filters = MetadataFilters(
filters=[ExactMatchFilter(key="metadata.page_label", value="2")]
)
retriever = vector_store_index.as_retriever(similarity_top_k=3, filters=metadata_filters)
nodes = retriever.retrieve("MongoDB acquisition")
for node in nodes:
print(node)
Node ID: 479446ef-8a32-410d-a5e0-8650bd10d78d
Text: MongoDB completed the redemption of 2026 Convertible Notes,
eliminating all debt from the balance sheet. Additionally, in
conjunction with the acquisition of Voyage, MongoDB is announcing a
stock buyback program of $200 million, to offset the dilutive impact
of the acquisition consideration.
Score: 0.914
Node ID: f3c35db6-43e5-4da7-a297-d9b009b9d300
Text: Lombard Odier, a Swiss private bank, partnered with MongoDB to
migrate and modernize its legacy banking technology systems on MongoDB
with generative AI. The initiative enabled the bank to migrate code
50-60 times quicker and move applications from a legacy relational
database to MongoDB 20 times faster than previous migrations.
Score: 0.912
Node ID: 82a2a0c0-80b9-4a9e-a848-529b4ff8f301
Text: Fourth Quarter Fiscal 2025 and Recent Business Highlights
MongoDB acquired Voyage AI, a pioneer in state-of-the-art embedding
and reranking models that power next-generation AI applications.
Integrating Voyage AI's technology with MongoDB will enable
organizations to easily build trustworthy, AI-powered applications by
offering highly accurate...
Score: 0.911

This section demonstrates how to implement RAG in your application with Atlas Vector Search and LlamaIndex. Now that you've learned how to run vector search queries to retrieve semantically similar documents, run the following code to use Atlas Vector Search to retrieve documents and a LlamaIndex query engine to then answer questions based on those documents.

This example does the following:

  • Instantiates Atlas Vector Search as a vector index retriever, a specific type of retriever for vector stores. It includes the similarity_top_k parameter so that Atlas Vector Search retrieves only the 5 most relevant documents.

  • Instantiates the RetrieverQueryEngine query engine to answer questions on your data. When prompted, the query engine performs the following actions:

    • Uses Atlas Vector Search as a retriever to query for semantically similar documents based on the prompt.

    • Calls the LLM that you specified when you set up your environment to generate a context-aware response based on the retrieved documents.

  • Prompts the LLM with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context. The generated response might vary.

# Instantiate Atlas Vector Search as a retriever
vector_store_retriever = VectorIndexRetriever(index=vector_store_index, similarity_top_k=5)
# Pass the retriever into the query engine
query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
# Prompt the LLM
response = query_engine.query("What was MongoDB's latest acquisition?")
print(response)
print("\nSource documents: ")
pprint.pprint(response.source_nodes)
MongoDB's latest acquisition was Voyage AI, a pioneer in embedding and reranking models for next-generation AI applications.
Source documents:
[NodeWithScore(node=TextNode(id_='82a2a0c0-80b9-4a9e-a848-529b4ff8f301', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='8cfe6680-8dec-486e-92c5-89ac1733b6c8', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='b6c412af868c29d67a6b030f266cd0e680f4a578a34c209c1818ff9a366c9d44'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='479446ef-8a32-410d-a5e0-8650bd10d78d', node_type='1', metadata={}, hash='b805543bf0ef0efc25492098daa9bd9c037043fb7228fb0c3270de235e668341')}, metadata_template='{key}: {value}', metadata_separator='\n', text="Fourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation\nAI applications. Integrating Voyage AI's technology with MongoDB will enable organizations to easily build trustworthy,\nAI-powered applications by offering highly accurate and relevant information retrieval deeply integrated with operational\ndata.", mimetype='text/plain', start_char_idx=1678, end_char_idx=2101, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9279670119285583),
NodeWithScore(node=TextNode(id_='453137d9-8902-4fae-8d81-5f5d9b0836eb', embedding=None, metadata={'page_label': '1', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='62b7cace-30c0-4687-9d87-e178547ae357', node_type='4', metadata={'page_label': '1', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='cb1dbd172c17e53682296ccc966ebdbb5605acb4fbf3872286e3a202c1d3650d'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='b6ae7c13-5bec-47f5-887f-835fc7bae374', node_type='1', metadata={'page_label': '1', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='a4835102686cdf03d1106946237d50031d00a0861eea892e38b928dd5e44e295'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='3d4034d3-bac5-4985-8926-9213f8a87318', node_type='1', metadata={}, hash='f103b351f2bda28ec3d2f1bb4f40d93ac1698ea5f7630a5297688a4caa419389')}, metadata_template='{key}: {value}', metadata_separator='\n', text='"Looking ahead, we remain incredibly excited about our long-term growth opportunity. MongoDB removes the constraints of legacy databases,\nenabling businesses to innovate at AI speed with our flexible document model and seamless scalability. Following the Voyage AI acquisition, we\ncombine real-time data, sophisticated embedding and retrieval models and semantic search directly in the database, simplifying the development of\ntrustworthy AI-powered apps."', mimetype='text/plain', start_char_idx=1062, end_char_idx=1519, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.921961784362793),
NodeWithScore(node=TextNode(id_='85dd431c-2d4c-4336-ab39-e87a97b30c59', embedding=None, metadata={'page_label': '4', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='311532cc-f526-4fc3-adb6-49e76afdd580', node_type='4', metadata={'page_label': '4', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='37f0ad7fcb7f204226ea7c6c475360e2db55bb77447f1742a164efb9c1da5dc0'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='6175bcb6-9e2a-4196-85f7-0585bcbbdd3b', node_type='1', metadata={}, hash='0e92e55a50f8b6dbfe7bcaedb0ccc42345a185048efcd440e3ee1935875e7cbf')}, metadata_template='{key}: {value}', metadata_separator='\n', text="Headquartered in New York, MongoDB's mission is to empower innovators to create, transform, and disrupt industries with software and data.\nMongoDB's unified, intelligent data platform was built to power the next generation of applications, and MongoDB is the most widely available, globally\ndistributed database on the market.", mimetype='text/plain', start_char_idx=0, end_char_idx=327, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9217028021812439),
NodeWithScore(node=TextNode(id_='f3c35db6-43e5-4da7-a297-d9b009b9d300', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='3008736c-29f0-4b41-ac0f-efdb469319b9', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='cd3647350e6d7fcd89e2303fe1995b8f91b633c5f33e14b3b4c18a16738ea86f'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='c9bef874-77ee-40bc-a1fe-ca42d1477cb3', node_type='1', metadata={}, hash='c7d7af8a1b43b587a9c47b27f57e7cb8bc35bd90390a078db21e3f5253ee7cc1')}, metadata_template='{key}: {value}', metadata_separator='\n', text='Lombard Odier, a Swiss private bank, partnered with MongoDB to migrate and modernize its legacy banking technology\nsystems on MongoDB with generative AI. The initiative enabled the bank to migrate code 50-60 times quicker and move\napplications from a legacy relational database to MongoDB 20 times faster than previous migrations.', mimetype='text/plain', start_char_idx=2618, end_char_idx=2951, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9197831153869629),
NodeWithScore(node=TextNode(id_='479446ef-8a32-410d-a5e0-8650bd10d78d', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='82a2a0c0-80b9-4a9e-a848-529b4ff8f301', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='688872b911c388c239669970f562d4014aaec4753903e75f4bdfcf1eb1daf5ab'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='3008736c-29f0-4b41-ac0f-efdb469319b9', node_type='1', metadata={}, hash='a854a9bf103e429ce78b45603df9e2341e5d0692aa95e544e6c82616be29b28e')}, metadata_template='{key}: {value}', metadata_separator='\n', text='MongoDB completed the redemption of 2026 Convertible Notes, eliminating all debt from the balance sheet. Additionally, in\nconjunction with the acquisition of Voyage, MongoDB is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.', mimetype='text/plain', start_char_idx=2102, end_char_idx=2396, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9183852672576904)]

This example does the following:

  • Defines a metadata filter on the metadata.page_label field so that Atlas Vector Search searches for documents appearing on page two only.

  • Instantiates Atlas Vector Search as a vector index retriever, a specific type of retriever for vector stores. It includes the metadata filters that you defined and the similarity_top_k parameter so that Atlas Vector Search retrieves only the 5 most relevant documents from page two.

  • Instantiates the RetrieverQueryEngine query engine to answer questions on your data. When prompted, the query engine performs the following actions:

    • Uses Atlas Vector Search as a retriever to query for semantically similar documents based on the prompt.

    • Calls the LLM that you specified when you set up your environment to generate a context-aware response based on the retrieved documents.

  • Prompts the LLM with a sample query about Atlas security recommendations.

  • Returns the LLM's response and the documents used as context. The generated response might vary.

# Specify metadata filters
metadata_filters = MetadataFilters(
filters=[ExactMatchFilter(key="metadata.page_label", value="2")]
)
# Instantiate Atlas Vector Search as a retriever
vector_store_retriever = VectorIndexRetriever(index=vector_store_index, filters=metadata_filters, similarity_top_k=5)
# Pass the retriever into the query engine
query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
# Prompt the LLM
response = query_engine.query("What was MongoDB's latest acquisition?")
print(response)
print("\nSource documents: ")
pprint.pprint(response.source_nodes)
MongoDB's latest acquisition was Voyage AI, a pioneer in embedding and reranking models that power next-generation AI applications.
Source documents:
[NodeWithScore(node=TextNode(id_='82a2a0c0-80b9-4a9e-a848-529b4ff8f301', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='8cfe6680-8dec-486e-92c5-89ac1733b6c8', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='b6c412af868c29d67a6b030f266cd0e680f4a578a34c209c1818ff9a366c9d44'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='479446ef-8a32-410d-a5e0-8650bd10d78d', node_type='1', metadata={}, hash='b805543bf0ef0efc25492098daa9bd9c037043fb7228fb0c3270de235e668341')}, metadata_template='{key}: {value}', metadata_separator='\n', text="Fourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation\nAI applications. Integrating Voyage AI's technology with MongoDB will enable organizations to easily build trustworthy,\nAI-powered applications by offering highly accurate and relevant information retrieval deeply integrated with operational\ndata.", mimetype='text/plain', start_char_idx=1678, end_char_idx=2101, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9280173778533936),
NodeWithScore(node=TextNode(id_='f3c35db6-43e5-4da7-a297-d9b009b9d300', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='3008736c-29f0-4b41-ac0f-efdb469319b9', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='cd3647350e6d7fcd89e2303fe1995b8f91b633c5f33e14b3b4c18a16738ea86f'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='c9bef874-77ee-40bc-a1fe-ca42d1477cb3', node_type='1', metadata={}, hash='c7d7af8a1b43b587a9c47b27f57e7cb8bc35bd90390a078db21e3f5253ee7cc1')}, metadata_template='{key}: {value}', metadata_separator='\n', text='Lombard Odier, a Swiss private bank, partnered with MongoDB to migrate and modernize its legacy banking technology\nsystems on MongoDB with generative AI. The initiative enabled the bank to migrate code 50-60 times quicker and move\napplications from a legacy relational database to MongoDB 20 times faster than previous migrations.', mimetype='text/plain', start_char_idx=2618, end_char_idx=2951, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9198455214500427),
NodeWithScore(node=TextNode(id_='479446ef-8a32-410d-a5e0-8650bd10d78d', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='82a2a0c0-80b9-4a9e-a848-529b4ff8f301', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='688872b911c388c239669970f562d4014aaec4753903e75f4bdfcf1eb1daf5ab'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='3008736c-29f0-4b41-ac0f-efdb469319b9', node_type='1', metadata={}, hash='a854a9bf103e429ce78b45603df9e2341e5d0692aa95e544e6c82616be29b28e')}, metadata_template='{key}: {value}', metadata_separator='\n', text='MongoDB completed the redemption of 2026 Convertible Notes, eliminating all debt from the balance sheet. Additionally, in\nconjunction with the acquisition of Voyage, MongoDB is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.', mimetype='text/plain', start_char_idx=2102, end_char_idx=2396, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.918432891368866),
NodeWithScore(node=TextNode(id_='3008736c-29f0-4b41-ac0f-efdb469319b9', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='479446ef-8a32-410d-a5e0-8650bd10d78d', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='833c2af73d617c1fef7d04111e010bfe06eeeb36c71225c0fb72987cd164526b'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='f3c35db6-43e5-4da7-a297-d9b009b9d300', node_type='1', metadata={}, hash='c39c6258ff9fe34b650dd2782ae20e1ed57ed20465176cbf455ee9857e57dba0')}, metadata_template='{key}: {value}', metadata_separator='\n', text='For the third consecutive year, MongoDB was named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud\nDatabase Management Systems. Gartner evaluated 20 vendors based on Ability to Execute and Completeness of Vision.', mimetype='text/plain', start_char_idx=2397, end_char_idx=2617, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.917201817035675),
NodeWithScore(node=TextNode(id_='d50a3746-84ac-4928-a252-4eda3515f9fc', embedding=None, metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2171a7d3-482c-4f83-beee-8c37e0ebc747', node_type='4', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='ef623ef7400aa6e120f821b455b2ddce99b94c57365e7552b676abaa3eb23640'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='25e4f1c9-41ba-4344-b775-842a0a15c207', node_type='1', metadata={'page_label': '2', 'file_name': 'mongodb-earnings-report.pdf', 'file_path': 'data/mongodb-earnings-report.pdf', 'file_type': 'application/pdf', 'file_size': 150863, 'creation_date': '2025-05-28', 'last_modified_date': '2025-05-28'}, hash='28af4302a69924722e2ccd2015b8d64fa83790b4f0d4759898ede48e40668fa1'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='13da6584-75b4-4eb8-a071-8297087ce12c', node_type='1', metadata={}, hash='e316923acbe01dede55287258f9649bb9865ef2357f2316e190b97aef84f22ec')}, metadata_template='{key}: {value}', metadata_separator='\n', text="as amended, including statements concerning MongoDB's financial guidance\nfor the first fiscal quarter and full year fiscal 2026 and underlying assumptions, our expectations regarding Atlas consumption growth and the benefits\nof the Voyage AI acquisition.", mimetype='text/plain', start_char_idx=5174, end_char_idx=5428, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=0.9084539413452148)]

To explore LlamaIndex's full library of tools for RAG applications, which includes data connectors, indexes, and query engines, see LlamaHub.

To extend the application in this tutorial to have back-and-forth conversations, see Chat Engine.

MongoDB also provides the following developer resources:

Back

LangChain4j

On this page