BlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more >>
MongoDB Developer
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right

Quickstart Guide to RAG Application Using LangChain and LlamaIndex

Kushagra Kesav10 min read • Published Jun 06, 2024 • Updated Jun 06, 2024
Facebook Icontwitter iconlinkedin icon
Rate this quickstart


While large language models such as GPT-4 are very good at generating content and logical reasoning, they face limitations when it comes to accessing and retrieving precise facts, or contextually relevant information. One popular approach to address this involves implementing a retrieval-augmented generation (RAG) system. This system integrates the language model with a vector database such as MongoDB Atlas Vector Search to form a comprehensive AI framework capable of orchestrating interactions between these components.
As the demand for efficient information retrieval continues to surge, understanding the syntax and capabilities of various frameworks becomes important. In this article, we will see the basics of vector search in simple terms. We'll look at LangChain, LlamaIndex, and PyMongo, showing you step-by-step how to use their methods for semantic search. By delving into these frameworks, we aim to understand their respective syntax, ultimately showcasing how they stack up against MongoDB's offerings in the realm of vector search.

Key takeaways

  • Overview of vector search and RAG
  • Extract saved data from MongoDB, convert to embeddings, store back, and run semantic search for contextual information
  • Build an end-to-end RAG system using MongoDB alongside AI frameworks like LlamaIndex and LangChain, then contrast their syntax

What is vector search?

A vector database is a type of data storage solution that manages and searches large amounts of high-dimensional numerical data (also known as vectorised data). This data is represented as vectors, which are created using an embedding model that takes input --- such as images, audio, video, and text --- and converts them into vectors. These vectors are stored in a database and can be queried using similarity-based search methods, allowing for fast and accurate retrieval of similar data objects.
When multiple vector representations are mapped into a high-dimensional space, the distance between these vectors in that space reflects the similarity between them. This is because the vectors capture the context and meaning of the original data, allowing for a refined understanding of their relationships. Vector databases are designed to quickly calculate the distance between vectors, enabling efficient retrieval of information based on a query vector or semantic similarity search. In contrast, traditional databases rely on keyword matching to retrieve information, which is a fundamentally different approach.

What is RAG?

Retrieval-augmented generation (RAG) is a system design pattern that harnesses information retrieval and generative AI to deliver accurate and relevant responses to user queries by gathering semantically related data to enrich user queries with extra context, processed as input for LLMs.
RAG architecture

Getting started with LangChain framework

Step 1: Installing required libraries for LangChain framework

This section guides you through the installation process of the essential libraries needed to implement the RAG application with LangChain. Here is the list of required libraries:
  • langchain: The Python toolkit for LangChain
  • langchain-mongodb: A Python package to use MongoDB as a vector store, semantic cache, chat history store, etc., in LangChain
  • pymongo: The Python toolkit for MongoDB
  • openAI: A Python library for the OpenAI API
  • nest-asyncio: A utility library for running an embedded asyncio event loop

Step 2: Data cleaning and loading

Here we are going to utilize the embedding_movies sample collection from sample_mflix, and we will do some cleaning before utilizing it. Please run the following command to remove the documents that do not contain the plot field, so we will not run into errors:
Now, we will create a Jupyter notebook and write down the following code, which will extract the data from the specified collection from our MongoDB Atlas database.
Here we see some environment variables used in the code. So, we'll make a .env file and put these variables in it.
In the above code, we are using the MongoLoader class to retrieve documents from MongoDB. The dotenv library is used to load environment variables from a .env file, and the nest_asyncio library enables asyncio event loop nesting.
Then, we load the environment variables from a .env file into our Jupyter environment using load_dotenv().
Additionally, we also initialize the MongoLoader class with parameters such as the MongoDB connection string, database name, collection name, etc. to retrieve data from each document.
The load() method of the MongoLoader instance is invoked to fetch documents from MongoDB based on the specified parameters, and further, we print it.

Step 3: Create embeddings with OpenAI

Now, we will add another code cell in the Jupyter notebook and run the following code to create the embeddings with OpenAI.
This code segment performs a couple of tasks related to setting up a search system using MongoDB Vector Search, LangChain, and OpenAI embeddings.
First, we initialize a MongoDB client using client = MongoClient(os.environ['MONGODB_URI']) and use the client instance to access a specific collection within the MongoDB database. This collection will hold the embedding data along with the text.
Here is the sample document:
Further, we set up vector search by initializing a vector search object using LangChain's MongoDBAtlasVectorSearch class.
Then, we pass the 'docs' variable, containing documents fetched from MongoDB earlier, to be used for setting up the vector search. This specific line of code, embedding=OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY']), uses the OpenAI API key and creates an embedding. It then passes the collection instance to the 'collection' parameter where vector embeddings will be stored.
We also specify the name of the vector index (here, vector_index) within the collection which will be used for the semantic search.
Overall, this code part handles the connections to a MongoDB instance and sets up a vector search system using LangChain, with vector data stored in MongoDB and embeddings generated by OpenAI.
Refer to OpenAI's FAQs to learn how you can get your OPENAI_API_KEY.
Now, let's move to our MongoDB Atlas UI and create a vector search index on our 'langchain_coll' collection.
To do that, visit and select the Atlas Search tab option on the navigation pane to create an Atlas Vector Search index. Click the Create Search Index button to create an Atlas Vector Search index.
Overview of Atlas Search capabilities
On the page to create a Vector Search index, select the Atlas Vector Search option that enables the creation of a vector search index by defining the index using JSON.
Creating a vector search index via MongoDB Atlas interface
To complete the creation of the index, select the database and collection for which the index should be created. In this case, it is the sample_mflix database and the langchain_coll collection. The JSON entered into the JSON editor should look like following:
Note: Please make sure your index name is vector_index, as you have set it in your .env file.
Here, make sure you are creating a vector index with the same name you have passed in your .env. In this case, it is vector_index.

Step 4: Perform vector search on user queries

Up to this point, we have successfully done the following:
  • Loaded data from our MongoDB database
  • Provided each document with embeddings using the OpenAI embedding model
  • Configured a MongoDB database for storing vector embeddings
  • Established a connection to this database from our development environment
  • Created a vector search index for optimized querying of vector embeddings
Now, head back to VS Code and write the following code which will put the vector search to work:
This step involves the process to set up a similarity search on ingested documents to find the best horror movie recommendations and then utilize an AI language model to generate a concise summary of these recommendations. So now, we will pass the prompt containing the question. Using the similarity_search_with_score method from the vector_search object, the code searches for the document that most closely matches this query, returning the top result.
Next, the language model ChatOpenAI is initialized with an API key. The prompt template is then created using ChatPromptTemplate, which sets the context for the language model as a movie recommendation engine. This template includes a system message defining the AI's role and a user message template with a placeholder for the output.
Further, an LLMChain object is then created, combining the language model and the prompt template. The content of the retrieved documents is extracted and concatenated into a single string, which serves as input for the language model. And finally, we print the output text to the console.
You can execute the whole code and see the semantic search in action, where it will return the best horror movie to watch. That's what we call magic.
Semantic Response using Langchain Framework

Getting started with LlamaIndex framework

Step 1: Installing required libraries for LlamaIndex framework

This section guides you through the installation process of the essential libraries needed to implement the RAG application with LlamaIndex. Here is the list of required libraries:
  • openAI: A python library for OpenAI API
  • llama_index: Python package to use MongoDB as a vector store, semantic cache, chat history store, etc., in llama_index
  • pymongo: Python toolkit for MongoDB

Step 2: Data cleaning and loading

Since we've already cleaned the data in the previous process, there's no need to repeat this step. However, if you're starting with this step, please refer to STEP 2 of the LangChain method above. The process for data cleansing remains the same.
Moving forward, we will use the SimpleMongoReader function offered by the Llamaindex to load the data:
This part helps us connect to a MongoDB database and retrieves documents from a specified collection. It uses a SimpleMongoReader class provided by the llama_index package to handle the interaction with MongoDB. Apart from basic parameters, it additionally specifies which fields from the documents should be indexed (in this case, "title" and "plot"). We can also specify a query dictionary that can be used to filter the data being indexed. Since it's empty {}, it indicates no specific filtering is applied.

Step 3: Create embeddings with OpenAI

In this step, we import libraries such as pymongo, openAI, and a few more from LlamaIndex to work with data stored in a MongoDB database and perform vector indexing and searching operations.
We then establish a MongoDB connection using client = pymongo.MongoClient(os.environ["MONGODB_URI"]), passing the environment variable "MONGODB_URI". Here, the os.environ function retrieves the value of the specified environment variable.
Following that, we create an Atlas vector store using an object of type MongoDBAtlasVectorSearch. It takes the database name, collection name, and index name as parameters, and stores the embeddings in the specified MongoDB collection.
Note: Don't forget to create a vector search index as you did earlier by visiting, selecting the Atlas Search tab option on the navigation pane, and selecting the sample_mflix database and the llamaindex_coll collection.
The JSON entered into the JSON editor should look similar to the following:
Subsequently, we create a vector index using storage_context = StorageContext.from_defaults(vector_store=store), which generates a storage context object using default settings and associates it with the specified vector store. Then, it creates a vector index on the documents loaded in our database. The show_progress parameter determines whether to display a progress bar during the indexing process or not.
Overall, this step handles setting up a connection to a MongoDB database using PyMongo and creating a vector store using a custom implementation for MongoDB Atlas. Further, it generates a vector index from a set of documents and embeds them into vector representations for efficient searching and retrieval.

Step 4: Perform vector search on user queries

Until now, we have successfully loaded the data in our collections along with the generated embedding for it using OpenAI.
Now, this step combines all the activities from the previous step to provide the functionality of conducting vector search on stored documents based on embedded user queries. This step takes the prompt and passes it as a query. The .as_query_engine(similarity_top_k=1) part determines that it's configuring the index to operate as a query engine and specifying that it will return only the most similar result.
And you will see the following output, after executing the whole code snippet, where it returns the best horror movie to watch:
Semantic Response using LlamaIndex Framework
If you are looking to use a pure aggregation pipeline ($vectorsearch) for semantic search without any LLM frameworks, please refer to our tutorial. You will find step-by-step instructions on how to build a RAG system using PyMongo, OpenAI, and MongoDB.

Summing it all up

In this tutorial, we walked through the process of creating a RAG application with MongoDB using two different frameworks. I showed you how to connect your MongoDB database to LangChain and LlamaIndex separately, load the data, create embeddings, store them back to the MongoDB collection, and then execute a semantic search using MongoDB Atlas vector search capabilities. View the GitHub repo for the implementation code.
If you have any questions or feedback, reach out through the MongoDB Community forums and let us know what you build using MongoDB Atlas Vector Search.

Facebook Icontwitter iconlinkedin icon
Rate this quickstart

Unlocking Semantic Search: Building a Java-Powered Movie Search Engine with Atlas Vector Search and Spring Boot

Jul 01, 2024 | 10 min read

MongoDB Atlas Multicloud Clusters

May 16, 2022 | 25 min

Build a Cocktail API with Beanie and MongoDB

Apr 02, 2024 | 6 min read

Creating an API With the AWS API Gateway and the Atlas Data API

Jul 12, 2024 | 8 min read
Table of Contents