Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Introducing MongoDB 8.0, the fastest MongoDB ever!
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Quickstart Guide to RAG Application Using LangChain and LlamaIndex

Kushagra Kesav10 min read • Published Jun 06, 2024 • Updated Sep 18, 2024
AIAtlasVector SearchPython
Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty

Introduction

While large language models such as GPT-4 are very good at generating content and logical reasoning, they face limitations when it comes to accessing and retrieving precise facts, or contextually relevant information. One popular approach to address this involves implementing a retrieval-augmented generation (RAG) system. This system integrates the language model with a vector database such as MongoDB Atlas Vector Search to form a comprehensive AI framework capable of orchestrating interactions between these components.
As the demand for efficient information retrieval continues to increase, understanding the syntax and capabilities of various frameworks becomes important. In this article, we will see the basics of vector search in simple terms. We'll look at LangChain, LlamaIndex, and PyMongo, showing you step-by-step how to use their methods for semantic search. By delving into these frameworks, we aim to understand their respective syntax, and showcasing how they stack up with MongoDB vector search.

Key takeaways

  • Overview of vector search and RAG
  • Extract saved data from MongoDB, convert to embeddings, store back, and run semantic search for contextual information
  • Build an end-to-end RAG system using MongoDB alongside AI frameworks like LlamaIndex and LangChain, then contrast their syntax

What is vector search?

A vector database is a type of data storage solution that manages and searches large amounts of high-dimensional numerical data (also known as vectorised data). This data is represented as vectors, which are created using an embedding model that takes input --- such as images, audio, video, and text --- and converts them into vectors. These vectors are stored in a database and can be queried using semantic search methods for fast and accurate retrieval of similar data objects.
When multiple vector representations are mapped into a high-dimensional space, the distance between these vectors in that space reflects the similarity between them. This is because the vectors capture the context and meaning of the original data, allowing for a refined understanding of their relationships. Vector databases are designed to quickly calculate the distance between vectors, enabling efficient retrieval of information based on a query vector or semantic search. In contrast, traditional databases rely on keyword matching to retrieve information, which is a fundamentally different approach.

What is RAG?

Retrieval-augmented generation (RAG) is a approach that uses information retrieval and generative AI to deliver accurate and relevant responses to user queries by gathering semantically related data to enrich user queries with extra context, processed as input for LLMs.
RAG architecture

Getting started with LangChain framework

Step 1: Installing required libraries for LangChain framework

This section guides you through the installation process of the essential libraries needed to implement the RAG application with LangChain. Here is the list of required libraries:
  • langchain: The Python toolkit for LangChain
  • langchain-mongodb: A Python package to use MongoDB as a vector store, semantic cache, chat history store, etc., in LangChain
  • pymongo: The Python toolkit for MongoDB
  • openAI: A Python library for the OpenAI API
  • nest-asyncio: A utility library for running an embedded asyncio event loop

Step 2: Data cleaning and loading

Here we are going to use the embedding_movies sample collection from sample_mflix, and we will do some cleaning before using it. Please run the following command to remove the documents that do not contain the plot field, so we will not run into errors:
1db.embedded_movies.deleteMany({ plot: { $exists: false } } )
2
3{ acknowledged: true, deletedCount: 80 }
Now, we will create a Jupyter notebook and write down the following code, which will extract the data from the specified collection.
1import os
2from dotenv import load_dotenv
3from langchain_community.document_loaders.mongodb import MongodbLoader
4import nest_asyncio
5
6nest_asyncio.apply()
7load_dotenv()
8loader = MongodbLoader(
9 connection_string=os.environ['MONGODB_URI'],
10 db_name=os.environ['MONGODB_DB'],
11 collection_name=os.environ['MONGODB_COLL'],
12 filter_criteria={},
13 field_names=["title", "plot"]
14)
15docs = loader.load()
16
17print(len(docs))
18docs[0]
Here we see some environment variables used in the code. So, we'll make a .env file and put these variables in it.
1# MongoDB URI
2# Replace <username> and <password> with your MongoDB credentials
3MONGODB_URI='mongodb+srv://<username>:<password>@kushagracluster.abcedf.mongodb.net/'
4
5# Name of the MongoDB database to use
6MONGODB_DB='sample_mflix'
7
8# Name of the MongoDB collection to use within the specified database
9MONGODB_COLL='embedded_movies'
10
11# API key for OpenAI.
12OPENAI_API_KEY="<OPENAI_API_KEY>"
13
14# Name of the index to use for vector storage
15MONGODB_VECTOR_INDEX='vector_index'
16
17# Name of the collection in MongoDB where embedding data is stored
18MONGODB_VECTOR_COLL_LANGCHAIN='langchain_coll'
In the above code, we are using the MongoLoader class to retrieve documents from MongoDB. The dotenv library is used to load environment variables from a .env file, and the nest_asyncio library enables asyncio event loop nesting.
Additionally, we also initialize the MongoLoader class with parameters such as the MongoDB connection string, database name, collection name, etc. to get data from each document.
The load() method of the MongoLoader instance is invoked to fetch documents from MongoDB based on the specified parameters, and further, we print it.

Step 3: Create embeddings with OpenAI

Now, we will add another code cell in the Jupyter notebook and run the following code to create the embeddings with OpenAI.
1from pymongo import MongoClient
2from langchain.vectorstores import MongoDBAtlasVectorSearch
3from langchain_community.embeddings import OpenAIEmbeddings
4from langchain.llms import OpenAI
5from langchain_community.chat_models import ChatOpenAI
6from langchain.prompts import ChatPromptTemplate
7from langchain.chains import LLMChain
8
9client = MongoClient(os.environ['MONGODB_URI'], appname="devrel.content.langchain_llamaIndex.python")
10collection = client.get_database(os.environ['MONGODB_DB']).get_collection(os.environ['MONGODB_VECTOR_COLL_LANGCHAIN'])
11
12vector_search = MongoDBAtlasVectorSearch.from_documents(
13 documents=docs,
14 embedding=OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY']),
15 collection=collection,
16 index_name=os.environ['MONGODB_VECTOR_INDEX'])
This code segment performs a couple of tasks related to setting up a search system using MongoDB Vector Search, LangChain, and OpenAI embeddings.
First, we initialize a MongoDB client using client = MongoClient(os.environ['MONGODB_URI']) and use the client instance to access a specific collection within the MongoDB database. This collection will hold the embedding data along with the text.
Here is the sample document:
1{
2 "_id": {
3 "$oid": "66375d1ff77b69f7b5f0c7ec"
4 },
5 "text": "The Black Pirate Seeking revenge, an athletic young man joins the pirate band responsible for his father's death....",
6 "embedding": [
7 -0.003676240383024406,
8 -0.02721614428071944,
9 ...
10 0.02543453162153851,
11]
12],
13 "database": "sample_mflix",
14 "collection": "embedded_movies"
15}
Further, we set up vector search by initializing a vector search object using LangChain's MongoDBAtlasVectorSearch class.
Then, we pass the 'docs' variable, containing documents fetched from MongoDB earlier, to be used for setting up the vector search. This specific line of code, embedding=OpenAIEmbeddings(openai_api_key=os.environ['OPENAI_API_KEY']), uses the OpenAI API key and creates an embedding. It then passes the collection instance to the 'collection' parameter where vector embeddings will be stored.
We also specify the name of the vector index (here, vector_index) within the collection which will be used for the semantic search.
Overall, this code part handles the connections to a MongoDB instance and sets up a vector search system using LangChain, with vector data stored in MongoDB and embeddings generated by OpenAI.
Refer to OpenAI's FAQs to learn how you can get your OPENAI_API_KEY.
Now, let's move to our MongoDB Atlas UI and create a vector search index on our 'langchain_coll' collection.
To do that, visit cloud.mongodb.com and select the Atlas Search tab option on the navigation pane to create an Atlas Vector Search index. Click the Create Search Index button to create an Atlas Vector Search index.
Overview of Atlas Search capabilities
On the page to create a Vector Search index, select the Atlas Vector Search option that enables the creation of a vector search index by defining the index using JSON.
Creating a vector search index via MongoDB Atlas interface
To complete the creation of the index, select the database and collection for which the index should be created. In this case, it is the sample_mflix database and the langchain_coll collection. The JSON entered into the JSON editor should look like following:
Note: Please make sure your index name is vector_index, as you have set it in your .env file.
1{
2 "fields": [
3 {
4 "numDimensions": 1536,
5 "path": "embedding",
6 "similarity": "cosine",
7 "type": "vector"
8 }
9 ]
10}
Here, make sure you are creating a vector index with the same name you have passed in your .env. In this case, it is vector_index.

Step 4: Perform vector search on user queries

Up to this point, we have successfully done the following:
  • Loaded data from our MongoDB database
  • Provided each document with embeddings using the OpenAI embedding model
  • Configured a MongoDB database for storing vector embeddings
  • Established a connection to this database from our development environment
  • Created a vector search index for optimized querying of vector embeddings
Now, head back to VS Code and write the following code which will put the vector search to work:
1# perform a similarity search on the ingested documents
2prompt='What is the best horror movie to watch?'
3docs_with_score = vector_search.similarity_search_with_score(query=prompt,k=1)
4
5llm = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'])
6
7prompt_template = ChatPromptTemplate.from_messages([
8 ("system", "You are a movie recommendation engine which posts a concise and short summary on relevant movies."),
9 ("user", "List of movies: {input}")
10])
11
12# Create an LLMChain
13chain = LLMChain(
14 llm=llm,
15 prompt=prompt_template
16)
17
18# Prepare the input for the chat model
19input_docs = "\n".join([doc.page_content for doc, _ in docs_with_score])
20
21# Invoke the chain with the input documents
22response = chain.invoke({"input": input_docs})
23print(response['text'])
This step involves the process to set up a similarity search on ingested documents to find the best horror movie recommendations and then use an AI language model to generate a concise summary of these recommendations. So now, we will pass the prompt containing the question. Using the similarity_search_with_score method from the vector_search object, the code searches for the document that most closely matches this query, returning the top result.
Next, the language model ChatOpenAI is initialized with an API key. The prompt template is then created using ChatPromptTemplate, which sets the context for the language model as a movie recommendation engine. This template includes a system message defining the AI's role and a user message template with a placeholder for the output.
Further, an LLMChain object is then created, combining the language model and the prompt template. The content of the retrieved documents is extracted and concatenated into a single string, which serves as input for the language model. And finally, we print the output text to the console.
You can execute the whole code and see the semantic search in action, where it will return the best horror movie to watch. That's what we call magic.
Semantic Response using Langchain Framework

Getting started with LlamaIndex framework

Step 1: Installing required libraries for LlamaIndex framework

This section guides you through the installation process of the essential libraries needed to implement the RAG application with LlamaIndex. Here is the list of required libraries:
  • openAI: A python library for OpenAI API
  • llama_index: Python package to use MongoDB as a vector store, semantic cache, chat history store, etc., in llama_index
  • pymongo: Python toolkit for MongoDB

Step 2: Data cleaning and loading

Since we've already cleaned the data in the previous process, there's no need to repeat this step. However, if you're starting with this step, please refer to STEP 2 of the LangChain method above. The process for data cleansing remains the same.
Moving forward, we will use the SimpleMongoReader function offered by the Llamaindex to load the data:
1import os
2from llama_index.readers.mongodb import SimpleMongoReader
3
4reader = SimpleMongoReader(uri=os.environ["MONGODB_URI"])
5documents = reader.load_data(
6 os.environ["MONGODB_DB"],
7 os.environ["MONGODB_COLL"],
8 field_names=["title","plot"], # these is a list of the top-level fields in your objects that will be indexed
9 query_dict={} # this is a mongo query dict that will filter your data if you don't want to index everything
10)
11print(documents[0])
This part helps us connect to a MongoDB database and retrieves documents from a specified collection. It uses a SimpleMongoReader class provided by the llama_index package to handle the interaction with MongoDB. Apart from basic parameters, it additionally specifies which fields from the documents should be indexed (in this case, "title" and "plot"). We can also specify a query dictionary that can be used to filter the data being indexed. Since it's empty {}, it indicates no specific filtering is applied.

Step 3: Create embeddings with OpenAI

1import pymongo
2import openai
3from llama_index.core import VectorStoreIndex,StorageContext
4from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
5
6# Create a new client and connect to the server
7client = pymongo.MongoClient(os.environ["MONGODB_URI"], appname="devrel.content.langchain_llamaIndex.python")
8
9store = MongoDBAtlasVectorSearch(
10 client,
11 db_name=os.environ['MONGODB_DB'],
12 collection_name=os.environ['MONGODB_VECTOR_COLL_LLAMAINDEX'], # this is where your embeddings will be stored
13 index_name=os.environ['MONGODB_VECTOR_INDEX']
14)
15
16storage_context = StorageContext.from_defaults(vector_store=store)
17index = VectorStoreIndex.from_documents(
18 documents, storage_context=storage_context,
19 show_progress=True, # this will show you a progress bar as the embeddings are created
20)
In this step, we import libraries such as pymongo, openAI, and a few more from LlamaIndex to work with data stored in a MongoDB database and perform vector indexing and searching operations.
We then establish a MongoDB connection using client = pymongo.MongoClient(os.environ["MONGODB_URI"]), passing the environment variable "MONGODB_URI". Here, the os.environ function retrieves the value of the specified environment variable.
Following that, we create an Atlas vector store using an object of type MongoDBAtlasVectorSearch. It takes the database name, collection name, and index name as parameters, and stores the embeddings in the specified MongoDB collection.
Note: Don't forget to create a vector search index as you did earlier by visiting cloud.mongodb.com, selecting the Atlas Search tab option on the navigation pane, and selecting the sample_mflix database and the llamaindex_coll collection.
The JSON entered into the JSON editor should look similar to the following:
1{
2 "fields": [
3 {
4 "numDimensions": 1536,
5 "path": "embedding",
6 "similarity": "cosine",
7 "type": "vector"
8 }
9 ]
10}
Subsequently, we create a vector index using storage_context = StorageContext.from_defaults(vector_store=store), which generates a storage context object using default settings and associates it with the specified vector store. Then, it creates a vector index on the documents loaded in our database. The show_progress parameter determines whether to display a progress bar during the indexing process or not.
Overall, this step handles setting up a connection to a MongoDB database using PyMongo and creating a vector store using a custom implementation for MongoDB Atlas. Further, it generates a vector index from a set of documents and embeds them into vector representations for efficient searching and retrieval.

Step 4: Perform vector search on user queries

Until now, we have successfully loaded the data in our collections along with the generated embedding for it using OpenAI.
Now, this step combines all the activities from the previous step to provide the functionality of conducting vector search on stored documents based on embedded user queries. This step takes the prompt and passes it as a query. The .as_query_engine(similarity_top_k=1) part determines that it's configuring the index to operate as a query engine and specifying that it will return only the most similar result.
1prompt="What is the best horror movie to watch?"
2response = index.as_query_engine(similarity_top_k=1).query(prompt)
3
4print(response)
And you will see the following output, after executing the whole code snippet, where it returns the best horror movie to watch:
Semantic Response using LlamaIndex Framework
If you are looking to use a pure aggregation pipeline ($vectorsearch) for semantic search without any LLM frameworks, please refer to our tutorial. You will find step-by-step instructions on how to build a RAG system using PyMongo, OpenAI, and MongoDB.

Summing it all up

In this tutorial, we walked through the process of creating a RAG application with MongoDB using two different frameworks. I showed you how to connect your MongoDB database to LangChain and LlamaIndex separately, load the data, create embeddings, store them back to the MongoDB collection, and then execute a semantic search using MongoDB Atlas vector search capabilities. View the GitHub repo for the implementation code.
If you have any questions or feedback, reach out through the MongoDB Community forums and let us know what you build using MongoDB Atlas Vector Search.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Optimize With MongoDB Atlas: Performance Advisor, Query Analyzer, and More


Jan 30, 2024 | 6 min read
Tutorial

Delivering a Near Real-Time Single View into Customers with a Federated Database


Jun 28, 2023 | 8 min read
Tutorial

Instant GraphQL APIs for MongoDB with Grafbase


Oct 12, 2023 | 7 min read
Tutorial

Get Hyped: Synonyms in Atlas Search


Feb 27, 2023 | 9 min read
Table of Contents