Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Building RAG Pipelines With Haystack and MongoDB Atlas

Pavel Duchovny4 min read • Published Sep 18, 2024 • Updated Sep 18, 2024
AIVector SearchPythonAtlas
SNIPPET
Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Integrating Haystack with MongoDB Atlas allows you to build powerful retrieval-augmented generation (RAG) pipelines. This introductory article will guide you through the process of setting up a Haystack-based RAG pipeline using MongoDB Atlas for vector search. Our code will use a grocery product dataset and the RAG pipeline can fetch relevant products for a user cooking request. Relevant groceries are passed to the LLM for a detailed generated guide.
All code presented in this tutorial is available in the GitHub repository.

Step 1: Install dependencies

First, install the necessary dependencies:
1pip install haystack-ai mongodb-atlas-haystack tiktoken datasets getpass re

Step 2: Set up MongoDB Atlas connection and OpenAI API key

If you have not created an Atlas cluster, follow our guide. Set the MongoDB connection string and OpenAI API key by following the guide on Open AI website.
1import os
2import getpass, re;
3
4conn_str = getpass.getpass("Enter your MongoDB connection string:")
5conn_str = (re.sub(r'appName=[^\s]*', 'appName=devrel.content.python', conn_str)
6 if 'appName=' in conn_str
7 else conn_str + ('&' if '?' in conn_str else '?') + 'appName=devrel.content.python')
8os.environ['MONGO_CONNECTION_STRING']=conn_str
9print(os.environ['MONGO_CONNECTION_STRING'])

Step 3: Create a Vector Search index on collection

Create a vector index on your database and collection in MongoDB Atlas. For more information and guidance, visit our Atlas Vector Search index docs. In this tutorial, the database is “ai_shop,”, and the collection name is “test_collection.”. Ensure that the index name is vector_index and specify the following syntax:
1{
2 "fields": [
3 {
4 "type": "vector",
5 "path": "embedding",
6 "numDimensions": 1536,
7 "similarity": "cosine"
8 }
9 ]
10}

Step 4: Set up vector store and load documents

Load documents into MongoDB Atlas using the Haystack framework:
1from haystack import Pipeline, Document
2from haystack.document_stores.types import DuplicatePolicy
3from haystack.components.writers import DocumentWriter
4from haystack.components.embedders import OpenAIDocumentEmbedder
5from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
6from bson import json_util
7
8# Example dataset
9dataset = {
10 "train": [
11 {"title": "Spinach Lasagna Sheets", "price": "$3.50", "description": "Infused with spinach, these sheets add a pop of color and extra nutrients.", "category": "Pasta", "emoji": "📗"},
12 {"title": "Gluten-Free Lasagna Sheets", "price": "$4.00", "description": "Perfect for those with gluten intolerance, made with a blend of rice and corn flour.", "category": "Pasta", "emoji": "🍚🌽"},
13 # Add more documents here...
14 ]
15}
16insert_data = []
17for product in dataset['train']:
18 doc_product = json_util.loads(json_util.dumps(product))
19 haystack_doc = Document(content=doc_product['title'], meta=doc_product)
20 insert_data.append(haystack_doc)
21document_store = MongoDBAtlasDocumentStore(
22 database_name="ai_shop",
23 collection_name="test_collection",
24 vector_search_index="vector_index",
25)
26doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
27doc_embedder = OpenAIDocumentEmbedder()
28indexing_pipe = Pipeline()
29indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
30indexing_pipe.add_component(instance=doc_writer, name="doc_writer")
31indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
32indexing_pipe.run({"doc_embedder": {"documents": insert_data}})

Step 5: Build a RAG pipeline

Create a pipeline that will retrieve, augment, and generate a response to user questions:
1from haystack.components.generators import OpenAIGenerator
2from haystack.components.builders.prompt_builder import PromptBuilder
3from haystack.components.embedders import OpenAITextEmbedder
4from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever
5
6# Prompt template
7prompt_template = """
8 You are a chef assistant allowed to use the following context documents and only those.\nDocuments:
9 {% for doc in documents %}
10 {{ doc.content }}
11 {% endfor %}
12 \Query: {{query}}
13 \nAnswer:
14"""
15
16# init a pipeline
17rag_pipeline = Pipeline()
18
19# Add embedder and vector store connected
20rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
21rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store, top_k=50), name="retriever")
22rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
23
24## Add prompt builder and connect context to prompt to LLM
25rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
26rag_pipeline.add_component(instance=OpenAIGenerator(model="gpt-4o"), name="llm")
27rag_pipeline.connect("retriever", "prompt_builder.documents")
28rag_pipeline.connect("prompt_builder", "llm")

Step 6: Test the pipeline

Test the pipeline with a sample query:
1query = "How can I cook a lasagne?"
2result = rag_pipeline.run(
3 {
4 "text_embedder": {"text": query},
5 "prompt_builder": {"query": query},
6 }
7)
8print(result['llm']['replies'][0])
Expected output:
1To cook a lasagne, you can follow this classic recipe:
2### Ingredients:
3#### For the meat sauce:
4- 2 tablespoons olive oil
5- 1 onion, finely chopped
6- 2 cloves garlic, minced
7- 500g ground beef
8- 800g canned tomatoes, crushed
9- 2 tablespoons tomato paste
10- 1 teaspoon dried basil
11- 1 teaspoon dried oregano
12- Salt and pepper to taste
13#### For the béchamel sauce:
14- 4 tablespoons butter
15- 4 tablespoons all-purpose flour
16- 500ml milk
17- A pinch of nutmeg
18- Salt and pepper to taste
19#### For assembly:
20- 250g lasagne sheets
21- 200g mozzarella cheese, shredded
22- 1 cup grated Parmesan cheese
23- Fresh basil leaves for garnish (optional)
24### Instructions:
251. **Preheat the oven** to 375°F (190°C).
262. **Prepare the meat sauce:**
27 - Heat the olive oil in a large skillet over medium heat.
28 - Add the chopped onion and cook until soft and translucent, about 5 minutes.
29 - Stir in the minced garlic and cook for another minute.
30 - Add the ground beef and cook until browned, breaking it up with a spoon as it cooks.
31 - Stir in the crushed tomatoes, tomato paste, dried basil, and dried oregano.
32 - Season with salt and pepper, then reduce the heat to low.
33 - Let the sauce simmer for 30 minutes, stirring occasionally.
343. **Prepare the béchamel sauce:**
35 - In a medium saucepan, melt the butter over medium heat.
36 - Add the flour and whisk continuously for about 2 minutes to create a roux.
37 - Gradually add the milk while whisking to prevent lumps from forming.
38 - Cook the mixture, whisking constantly, until it thickens, about 5-7 minutes.
39 - Season with a pinch of nutmeg, salt, and pepper.
404. **Assemble the lasagne:**
41 - Spread a thin layer of the meat sauce on the bottom of a 9x13 inch baking dish.
42 - Place a layer of lasagne sheets over the sauce.
43 - Spread another layer of meat sauce over the lasagne sheets, followed by a layer of béchamel sauce.
44 - Sprinkle some shredded mozzarella cheese over the béchamel sauce.
45 - Repeat the layers until all the ingredients are used, finishing with a layer of béchamel sauce and a generous topping of mozzarella and Parmesan cheese.
465. **Bake the lasagne:**
47 - Cover the baking dish with aluminum foil.
48 - Bake in the preheated oven for 30 minutes.
49 - Remove the foil and bake for an additional 15 minutes, or until the top is golden brown and bubbling.
506. **Rest and serve:**
51 - Remove the lasagne from the oven and let it rest for 10-15 minutes before slicing.
52 - Garnish with fresh basil leaves if desired, and serve.
53Enjoy your delicious homemade lasagne!

Conclusion

In this article, you learned how to integrate Haystack with MongoDB Atlas to build a RAG pipeline. This powerful combination allows you to leverage vector search and retrieval-augmented generation to create sophisticated and responsive applications.
To explore more topics on RAG, have a look at the following tutorials:
If you have questions or want to connect with other developers, join us in the MongoDB Developer Community. Thanks for reading.

Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

Implementing Robust RAG Pipelines: Integrating Google's Gemma 2 (2B) Open Model, MongoDB, and LLM Evaluation Techniques


Sep 12, 2024 | 20 min read
Article

Realm Triggers Treats and Tricks - Document-Based Trigger Scheduling


Sep 09, 2024 | 5 min read
Article

Data Modeling and Schema Design for Atlas Search


Sep 04, 2024 | 23 min read
Code Example

EHRS-Peru


Sep 11, 2024 | 3 min read
Table of Contents