BlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more >>
MongoDB Developer
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right

How to Implement Agentic RAG Using Claude 3.5 Sonnet, LlamaIndex, and MongoDB

Richmond Alake17 min read • Published Jul 02, 2024 • Updated Jul 02, 2024
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
reference architecure for agentic system
In June 2024, Anthropic released Claude 3.5 Sonnet, a multimodal model that outperformed its predecessors and competitors in graduate-level reasoning, undergraduate-level knowledge, and coding proficiency at the time of its release.
The reasoning capabilities of large language models  (LLMs) are reaching levels that make them key drivers for recommendation systems and agentic systems with access to a collection of tools for task completion. While LLM-powered chatbots using retrieval-augmented generation (RAG) remain the prominent form factor for LLM applications, agentic systems add a new dimension to AI applications.
Agentic systems leverage LLMs' tool use, reasoning, and planning emergent abilities to decompose tasks and make tool selections for task completion. Within LLM applications, this enables LLMs to decide when and where to source knowledge from. This is made possible by creating data retrievers as tools within the agentic system. This tutorial will show you how to build such a system and more.
In this tutorial, you will learn the following: *- Build an agentic RAG system with Claude 3.5 Sonnet
  • Use MongoDB within an agentic RAG system as the memory provider
  • Leverage LlamaIndex integration with Anthropic, MongoDB, and model providers to develop AI systems
  • Develop AI agents with LlamaIndex
  • Use an in-depth embedding process with LlamaIndex*
View the complete code for this tutorial.
Don’t forget to watch and star the GenAI showcase repo to get updates when new notebooks are added.

What is agentic RAG?

Let’s define an AI agent: An agent is an artificial computational entity aware of its environment. It is equipped with faculties that enable perception through input, action through tool use, and cognitive abilities through foundation models backed by long-term and short-term memory.
The landscape of LLM applications and AI applications as a whole is undergoing another shift. While not seismic, this shift introduces new application development paradigms that AI stack engineers, AI builders, and software developers should be aware of.
As mentioned earlier, RAG-enabled chatbots are currently the dominant form of LLM applications in production. However, the scope of LLM applications has widened to cover code execution within your infrastructure. This expansion is due to foundation models created by companies like OpenAI, Anthropic, and Cohere becoming more powerful, exhibiting emergent abilities such as tool use, advanced planning and reasoning, and problem decomposition.
Agentic RAG is a paradigm that leverages LLMs' routing, tool use, reasoning, and planning capabilities alongside information retrieval based on comparing query and stored data semantics. This system paradigm enables the development of dynamic LLM applications that can access various tools to execute queries, decompose tasks, and solve complex problems.
Retrievers are a fundamental component in RAG pipelines, serving as the interface between user queries and knowledge bases. For those experienced in LLM applications, retrievers are a well-established concept. If retrievers are a new concept, they can be understood as modules responsible for executing semantic searches (or other information retrieval methods) against a corpus of information such as a data store. 
These modules employ various retrieval methods — such as dense vector similarity, sparse lexical matching, or hybrid approaches — to efficiently fetch relevant data from a structured knowledge source. The retriever's primary function is to identify and extract contextually appropriate information based on the semantic similarity between the input query and the stored data; this then provides the LLM with relevant and domain-specific context for generating relevant, grounded, and informed responses. Retrievers can be used in a simple RAG pipeline or agentic systems like the one built in this tutorial.
Incorporating a retriever as a tool that an AI agent has access to enables the dynamic utilization of external knowledge sources, enhancing the agent's ability to provide accurate and contextually relevant responses. This integration allows the agent to adapt to changing or updated information in real time by accessing the latest data from the knowledge base or using its parametric knowledge to answer simple queries.

Implementing agentic RAG with Claude 3.5 Sonnet

This section covers the implementation of the agentic RAG system using Anthropic as the model provider, OpenAI as the embedding model provider, LlamaIndex as the LLM data framework, and MongoDB as the memory provider for the agentic system. 
Below is a breakdown of the implementation steps of the agentic RAG system:
  1. Environment and library setup: Install libraries such as llama-index, PyMongo, etc. This step also includes setting environment variables for API keys (Anthropic, Hugging Face, OpenAI).
  2. LLM and embedding model configuration: Initialize the Anthropic Claude 3.5 Sonnet LLM and the OpenAI embedding model.
  3. Data loading and processing: Load the Airbnb dataset, convert it to a suitable format for LlamaIndex (nodes and documents), and handle metadata appropriately.
  4. Embedding generation: Generate embeddings for each node using the chosen embedding model (OpenAI, in this case, due to its wide adoption).
  5. MongoDB setup: Establish a connection to MongoDB Atlas and create a database and collection for storing the Airbnb data.
  6. Vector database integration: Utilize MongoDB Atlas Vector Search to create a vector store and index the embeddings.
  7. Retriever tool creation: Build a QueryEngineTool that leverages the vector store to retrieve relevant information based on user queries.
  8. AI agent creation: Instantiate a FunctionCallingAgentWorker and convert it into an agent that can interact with the user and use the retriever tool.
  9. User interaction: Interact with the AI agent by posing questions or requests and observing the responses generated based on the knowledge base and the Claude 3.5 Sonnet LLM capabilities.
The key parts of this tutorial are the operations covered in Steps 7 and 8. These steps concern all procedures for creating the AI agent and tools. 

Step 1: Environment and library setup

To begin setting up our AI agent, we'll need to install several key libraries. The code snippet below installs the necessary packages.
Let's break down the packages and libraries we're installing:
  1. llama-index: This is the core library we'll use to build our AI agent. It provides the fundamental tools and functionalities for integrating large language models with external data sources.
  2. llama-index-vector-stores-mongodb: This package enables us to use MongoDB as our vector database, which will be crucial for efficiently storing and retrieving vector embeddings. This package integrates MongoDB with the LlamaIndex Python library.
  3. llama-index-llms-anthropic: This module allows us to integrate Anthropic's language models — specifically, Claude 3.5 Sonnet — into our LlamaIndex pipeline.
  4. llama-index-embeddings-openai: We'll use this to leverage OpenAI's embedding models to create vector representations of our text data.
  5. pymongo, pandas, and datasets: These additional libraries will help us with MongoDB database connection and operations, efficient data manipulation, and access to datasets hosted on Hugging Face.
Following the installation of the required libraries, the next step is to set up our environment variables.
Let's break down what each of these environment variables represents:
  1. ANTHROPIC_API_KEY: This is your unique identifier for accessing Anthropic's AI models, including Claude 3.5 Sonnet. Access or create your anthropic API key.
  2. HF_TOKEN: This token is associated with your Hugging Face account. It's required to access datasets and models hosted on the Hugging Face platform.
  3. OPENAI_API_KEY: This key allows access to OpenAI's services, particularly their embedding models, which we'll use to generate vector representations of text data.

Step 2: LLM and embedding model configuration

With our environment variables in place, we can now configure the LLM and embedding models. This step is crucial for setting up the core components of our agentic RAG system, which is the agent's brain.
Here's what each part of the code snippet above does:
  1. Import the necessary classes from LlamaIndex to work with our chosen models.
  2. Initialize the Anthropic LLM, specifically using the "claude-3-5-sonnet-20240620" model. This model will serve as the brain of the AI agent, handling complex reasoning, generation tasks, tool use, and function calling.
  3. For our embedding model, we're using OpenAI's "text-embedding-3-small." This model is configured with the following parameters:
    • dimensions=256: This specifies the size of our embedding vectors.
    • embed_batch_size=10: This sets how many items are processed in a single batch, optimizing for efficiency.
    • Securely retrieve the OpenAI API key from our environment variables.
  4. Finally, we set these models as our LlamaIndex Settings default. This ensures that all downstream processes of the agentic system will use these models unless explicitly overridden.

Step 3: Data loading and processing

Now that the models are configured, the next step is loading and preparing the dataset. This tutorial uses a subset of the Airbnb embedding dataset from Hugging Face. Here's a breakdown of the data-loading process:
Let's highlight the key parts of the code snippet above:
  1. Import the necessary libraries: load_dataset from Hugging Face's datasets library and pandas for data manipulation.
  2. Use the load_dataset() methodto fetch the "MongoDB/airbnb_embeddings" dataset. The split="trainparameter specifies we want the training split, and streaming=True` enables iteratively loading the dataset into the development environment without loading its entire content.
  3. To manage the dataset size for this tutorial, we use dataset.take(4000) to limit our sample to 4000 entries. This allows us to work with a manageable subset while still having enough data to demonstrate the system's capabilities.
  4. Convert the dataset to a pandas DataFrame using pd.DataFrame(dataset). This transformation gives us access to pandas' powerful data manipulation tools.
  5. Finally, we display the first five rows of the DataFrame with dataset_df.head(5) to get a quick overview of our data.
It's important to note that you'll need a Hugging Face token (HF_TOKEN) in your development environment to access this dataset. If you haven't already set this up, you can obtain a token.
This Airbnb embedding dataset is particularly suitable for our agentic RAG system as it already includes pre-computed text and image embeddings. These embeddings represent various features of Airbnb listings, allowing our system to perform semantic searches efficiently using visual or text-based information. Using this dataset, we're simulating a real-world scenario where an AI agent might need to quickly retrieve and analyze information about various properties and perform a recommendation task. 
The code snippet above removes the pre-existing text embeddings and prepares to create new ones. This tutorial takes this approach to showcase the selection of data attributes for vector embedding creation using LlamaIndex functionalities. Also, in more practical scenarios and production applications, you would want more control over the embedding process and to ensure consistency with the chosen embedding model. 

Step 4: Embedding generation

This step generates new vector embeddings using the configured OpenAI embedding model. This involves processing the relevant text fields from our Airbnb dataset (such as descriptions, titles, or reviews) to create vector representations that capture the semantic meaning of each listing.
This step transforms our dataset into a format optimized for our LlamaIndex-based RAG system. This process involves converting our pandas DataFrame into a list of LlamaIndex Document objects. 
The code snippet below executes the process of creating LlamaIndex documents, selecting data attributes for embedding, and configuring the embedding process.
Here's a more technical description of the operations in the code snippet:
  1. Convert the DataFrame to JSON, then back to a list of Python dictionaries. This step ensures all data is in a format that can be easily manipulated.
  2. Iterate through each document, converting complex nested objects (like lists and dictionaries) to JSON strings. This is crucial because the metadata values in LlamaIndex Documents must be simple types (str, int, float, or None).
  3. For each listing, we create a LlamaIndex Document object. The text field is set to the listing's description, which will be the primary content for our embeddings and LLM processing.
  4. Set up metadata exclusion lists for both the LLM and embedding models. This allows us to control which fields are used in different contexts, optimizing for relevance and efficiency of the embedding process.
  5. Specify the use of custom templates for metadata and text formatting. This ensures that our data is presented in a consistent, easily parsable format for the LLM and embedding model.
  6. Finally, we demonstrate how the formatted data looks from the perspective of the LLM and the embedding model.
The next set of operations generates embeddings for our prepared documents, transforming our text data into vector representations suitable for semantic search.
Here's a detailed explanation of the code snippet above:
  1. Import necessary components from LlamaIndex and tqdm for progress tracking.
  2. Initialize a SentenceSplitter with a chunk_size of 5000 and chunk_overlap of 200. This splitter breaks down our documents into manageable chunks, ensuring that each piece is small enough for efficient processing while maintaining context through overlap.
  3. Apply the splitter to the llama_documents, creating a list of nodes. Each node represents a chunk of text from our original documents.
  4. Set up a progress bar using tqdm to visualize the embedding process, which is especially useful for large datasets that might take a while to embed all generated nodes.
  5. Iterate through each node, generating an embedding using our previously configured embed_model (OpenAI's "text-embedding-3-small"). The get_content method with MetadataMode.EMBED ensures only relevant parts of each node pre-selected earlier are embedded.
  6. Assign the generated embedding to the node, linking each text chunk and metadata with its vector representation.
It's worth noting that we've commented out a SemanticSplitterNodeParser. This alternative splitter could be used for more nuanced document splitting based on semantic relationships between sentences, which might benefit certain documents or use cases.

Step 5: MongoDB setup

MongoDB acts as an operational and vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries, and retrieves vector embeddings. Overall, MongoDB acts as the memory provider within an agentic system.
Creating a database and collection within MongoDB is made simple with MongoDB Atlas.
  1. First, register for a MongoDB Atlas account. Existing users can sign into MongoDB Atlas.
  2. Follow the instructions. Select Atlas UI as the procedure to deploy your first cluster.
  3. Create the database: airbnb.
  4. Within the database airbnb, create the collection listings_reviews.
  5. Create a vector search index named vector_index for the listings_reviews collection. This index enables the RAG application to retrieve records as additional context to supplement user queries via vector search. Below is the JSON definition of the data collection vector search index.
Your vector search index created on MongoDB Atlas should look like below:
Follow MongoDB’s steps to get the connection string from the Atlas UI. Securely store the URI within your development environment after setting up the database and obtaining the Atlas cluster connection URI. This setup is essential for enabling efficient semantic search capabilities in our RAG system. 
Here's a detailed explanation of each part:
  1. Start by setting the MONGO_URI environment variable. This variable should be securely stored and not hard-coded in a production environment.
  2. Import pymongo, the Python driver for MongoDB.
  3. Define a get_mongo_client function that creates a MongoDB database object using the provided URI. 
  4. Retrieve the MONGO_URI from the environment variables and check if it's set.
  5. Use the get_mongo_client function to establish a connection to MongoDB.
  6. Define the database name (airbnb) and collection name ( listings_reviews).
  7. Finally, we use the MongoDB client to get references to our specific database and collection.
It's important to note a few best practices here:
  • Your code should keep the MONGO_URI secure and not expose it. Consider using a secrets management system in a production environment.
  • Using get_database() and get_collection() methods instead of dictionary-style access (db['collection']) is preferred as it's less error-prone.
The last piece of code below for this step ensures we start with a fresh, empty collection. This is particularly useful in development and testing environments where we want to avoid mixing old and new data.

Step 6: Vector database integration

This step leverages the LlamaIndex’s MongoDB integration to create an instance of the MongoDB vector database.
Here's a more technical explanation of the process:
  1. Import MongoDBAtlasVectorSearch from LlamaIndex's MongoDB integration. This class provides an interface between LlamaIndex and MongoDB Atlas Vector Search.
  2. Create an instance of MongoDBAtlasVectorSearch and assign it to the variable vector_store. This object will serve as our primary interface for vector search operations.
  3. Pass several parameters to initialize the vector store:
    • mongo_client: Our previously established MongoDB client connection.
    • db_name: The name of our database (in this case, "airbnb").
    • collection_name: The name of our collection ("listings_reviews").
    • index_name: The name of our vector search index ("vector_index").
It's important to note that for this to work correctly:
  • Your MongoDB Atlas cluster must be set up with a vector search index created.
  • The "vector_index" mentioned here should correspond to the name of the vector index you've created in your MongoDB Atlas cluster. This index should be configured to index the field where your embeddings are stored.
The ingestion process of the nodes — which contain the metadata, chunk, and embeddings — to a MongoDB database and collection is done in a single line due to the intuitive integration of LlamaIndex with MongoDB. This solution saves several lines of implementation code, significantly streamlining the data ingestion process.
Although it’s just one line of code, here's what this operation accomplishes:
  1. Data ingestion: This command takes the nodes (which contain our Airbnb listing data and their corresponding embeddings) and adds them to our MongoDB Atlas vector store.
  2. Vector indexing: MongoDB Atlas automatically indexes the embedding vectors as the nodes are added, making them ready for efficient similarity searches.
  3. Metadata storage: Along with the embeddings, any associated metadata from our nodes (like listing details) is also stored, allowing for rich, contextual retrieval in downstream processes.
Some important points to keep in mind:
  • This operation may take some time, depending on the number of nodes and the size of your dataset. For large datasets, you might want to consider batching this operation.
  • Ensure your MongoDB instance has enough storage capacity for your data and index.
  • The performance of this operation can impact the overall setup time of your RAG system, but it's a one-time cost that enables fast retrieval later.

Step 7: Retriever tool creation

In this part of the tutorial, the key tool for the agent to utilize in retrieving relevant data from the MongoDB database is created. The creation of the retriever tool involves two steps, leveraging LlamaIndex's advanced indexing and querying capabilities:
  1. Creating a vector store index
  2. Creating a query engine from the index to retrieve documents from the database
Finally, this query engine is encapsulated within a QueryEngineTool. This higher-level interface transforms the query engine into a tool for the AI agent, along with metadata that provides additional information about its functionality and when it’s to be utilized.
Here's what each part of the code snippet above does:
  1. Import necessary classes from LlamaIndex to create the index and query engine tool.
  2. Create a VectorStoreIndex from the vector_store. This index provides a high-level interface for querying our vector store.
  3. Convert the index into a query engine, specifying the following:
    • similarity_top_k=5: This means the queries will return the top five most similar results.
    • llm=llm: Pass our previously configured language model (Claude 3.5 Sonnet) to the query engine.
  4. We create a QueryEngineTool, which wraps our query engine with metadata. Our agent can use this tool to interact with the knowledge base.
The description we've given to the tool is particularly important:
  • "Provides information about Airbnb listings and reviews": This clearly states what kind of information the tool can provide.
  • "Use a detailed plain text question as input to the tool": This guides the agent on how to formulate queries to this tool.
This setup forms the core of our RAG system's ability to retrieve relevant information. When a user asks a question about Airbnb listings, our agent can use this tool to:
  1. Convert the question into a vector representation.
  2. Find the most similar documents in our vector store.
  3. Retrieve these documents and their associated metadata.
  4. Use this information to formulate an informed response.

Step 8: AI agent creation

This is the final step of the agentic system creation process. The operations in this step aim to create a function-calling agent capable of using defined tools to execute tasks while leveraging the reasoning and planning emergent properties of its LLM to decompose complex tasks and assign their completion to specific tools.
Below are the operations in the code snippet above:
  1. Import the FunctionCallingAgentWorker from LlamaIndex, designed to create an agent capable of using tools (like our query engine) to accomplish tasks.
  2. Create an instance of FunctionCallingAgentWorker using the from_tools method. Pass in:
    • query_engine_tool as a list of available tools
    • The configured language model (llm)
    • verbose=True to enable detailed logging of the agent's actions
  3. Convert the agent worker into an agent using the as_agent() method.

Step 9: User Interaction

The final step is to process a user query and generate a response. Invoking the created agent's chat method initiates an interaction with the agent, simulating a real-world scenario where a user is seeking information about Airbnb listings in New York. 
The agent leverages its sophisticated reasoning capabilities, powered by Claude 3.5 Sonnet, to interpret the query, retrieve relevant information from the MongoDB vector store, and synthesize a comprehensive response.
Example output from the Agentic RAG built with Claude 3.5
Here's what the code snippet does:
  1. Use the agent's chat method, passing in the query, "Tell me the best listing for a place in New York."
  2. The agent processes this query using the following steps:
    • It analyzes the query to understand the user's request.
    • It determines that it needs to use the knowledge base tool to find information about New York listings.
    • It uses the query_engine_tool to search the vector store for relevant listings.
    • It synthesizes the retrieved information to respond to the "best" listing.
  3. The response from the agent is stored in the response variable.


This tutorial has guided you through the comprehensive process of building an agentic RAG system using Claude 3.5 Sonnet, LlamaIndex, and MongoDB. From setting up the environment to creating an AI agent capable of complex reasoning and tool use, we've explored the latest form factor of LLM applications.
Agentic RAG represents a step in LLM application development, moving beyond simple question-answering to enable dynamic, multi-step problem-solving and function calling. By combining the emergent abilities of large language models, such as reasoning and planning, with the flexibility of tool use and the efficiency of vector search, intuitive and efficient AI applications can be built. This approach enhances the accuracy and relevance of AI responses and lays the foundation for more autonomous and adaptable AI systems. 
To continue your journey in exploring agentic systems, we recommend diving into related tutorials such as "Build AI Agents With Memory," which will further expand your understanding and capabilities in this rapidly evolving field.


1. What is agentic RAG, and how does it differ from traditional RAG systems?
Agentic RAG is an advanced paradigm that combines retrieval-augmented generation (RAG) with AI agent capabilities. Unlike traditional RAG systems, agentic RAG leverages LLMs' routing, tool use, reasoning, and planning abilities alongside information retrieval. This enables more dynamic and complex problem-solving, allowing the system to decompose tasks, make tool selections, and execute queries more effectively.
2. How can MongoDB be used as a memory provider in an agentic RAG system?
MongoDB serves as an operational and vector database in an agentic RAG system. It efficiently stores, queries, and retrieves vector embeddings, acting as the system's memory. By utilising MongoDB Atlas Vector Search, the system can perform fast semantic searches on stored data, enabling the AI agent to access relevant information quickly and enhance its decision-making process.
3. What key components are needed to build an agentic RAG system with Claude 3.5 Sonnet?
To build an agentic RAG system with Claude 3.5 Sonnet, you need:
  • Claude 3.5 Sonnet as the core language model.
  • LlamaIndex for integrating LLMs with external data sources.
  • MongoDB for vector storage and retrieval.
  • An embedding model (e.g., OpenAI's text-embedding-3-small).
  • A retriever tool to fetch relevant information.
  • A function-calling agent worker to handle complex queries.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial

Querying the MongoDB Atlas Price Book with Atlas Data Federation

Jun 15, 2023 | 4 min read

Streamlining Cloud-Native Development with Gitpod and MongoDB Atlas

Apr 02, 2024 | 5 min read

Interactive RAG With MongoDB Atlas + Function Calling API

Jul 03, 2024 | 16 min read

Building an Autocomplete Form Element with Atlas Search and JavaScript

Feb 03, 2023 | 8 min read
Table of Contents
  • What is agentic RAG?