EventGet 50% off your ticket to MongoDB.local London on October 2. Use code WEB50Learn more >>
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Taking RAG to Production with the MongoDB Documentation AI Chatbot

Ben Perlmutter11 min read • Published Aug 29, 2024 • Updated Aug 29, 2024
ReactNode.jsAtlas
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
At MongoDB, we have a tagline: "Love your developers." One way that we show love to our developers is by providing them with excellent technical documentation for our products. Given the rise of generative AI technologies like ChatGPT, we wanted to use generative AI to help developers learn about our products using natural language. This led us to create an AI chatbot that lets users talk directly to our documentation. With the documentation AI chatbot, users can ask questions and then get answers and related content more efficiently and intuitively than previously possible.
You can try out the chatbot at mongodb.com/docs.
This post provides a technical overview of how we built the documentation AI chatbot. It covers:
  • The chatbot’s retrieval augmented generation (RAG) architecture.
  • The challenges in building a RAG chatbot for the MongoDB documentation.
  • How we built the chatbot to overcome these challenges.
  • How we used MongoDB Atlas in the application.
  • Next steps for building your own production RAG application using MongoDB Atlas.

The chatbot's RAG architecture

We built our chatbot using the retrieval augmented generation (RAG) architecture. RAG augments the knowledge of large language models (LLMs) by retrieving relevant information for users' queries and using that information in the LLM-generated response. We used MongoDB's public documentation as the information source for our chatbot's generated answers.
To retrieve relevant information based on user queries, we used MongoDB Atlas Vector Search. We used the Azure OpenAI ChatGPT API to generate answers in response to user questions based on the information returned from Atlas Vector Search. We used the Azure OpenAI embeddings API to convert MongoDB documentation and user queries into vector embeddings, which help us find the most relevant content for queries using Atlas Vector Search.
Here's a high-level diagram of the chatbot's RAG architecture:
For a more detailed explanation of RAG, visit our overview of using MongoDB for RAG.

Building a "naive RAG" MVP

Over the past few months, a lot of tools and reference architectures have come out for building RAG applications. We decided it would make the most sense to start simple, and then iterate with our design once we had a functional minimal viable product (MVP).
Our first iteration was what Jerry Liu, creator of RAG framework LlamaIndex, calls "naive RAG". This is the simplest form of RAG. Our naive RAG implementation had the following flow:
  • Data ingestion: Ingesting source data into MongoDB Atlas, breaking documents into smaller chunks, and storing each chunk with its vector embedding. Index the vector embeddings using MongoDB Atlas Vector Search.
  • Chat: Generating an answer by creating an embedding for the user's question, finding matching chunks with MongoDB Atlas Vector Search, and then summarizing an answer using these chunks.
We got a reasonably functional naive RAG prototype up and running with a small team in under two months. To assess the quality of generated responses and links, we had MongoDB employees volunteer to test out the chatbot in a red teaming exercise.
To learn more about the approach we took to red teaming, refer to the documentation from Microsoft.

Challenges in building a RAG application for MongoDB documentation

The red teaming exercise revealed that the naive RAG chatbot provided satisfactory answers roughly 60% of the time.
For the 40% of answers that were unsatisfactory, we noticed a few common themes:
  • The chatbot was not aware of previous messages in the conversation.
For example, the conversation might go like:
  • The chatbot sometimes gave niche or overly specific solutions when a more general answer would have been useful. MongoDB has many products with overlapping functionality (database drivers in multiple languages, MongoDB on Atlas and self-hosted, etc.) and without a clear priority, it could seemingly choose one at random.
For example, the conversation might go like:
  • The chatbot’s further reading links were not consistently relevant.
For example the conversation might go like:
To get the chatbot to a place where we felt comfortable putting it out into the world, we needed to address these limitations.

Refactoring the chatbot to be production ready

This section covers how we built the documentation AI chatbot to address the previously mentioned limitations of naive RAG to build a not-so-naive chatbot that better responds to user questions.
Using the approach described in this section, we got the chatbot to over 80% satisfactory responses in a subsequent red teaming exercise.

Data ingestion

We set up a CLI for data ingestion, pulling content from MongoDB's documentation and the Developer Center. A nightly cron job ensures the chatbot's information remains current.
Our ingestion pipeline involves two primary stages:

1. Pull raw content

We created a pages CLI command that pulls raw content from data sources into Markdown for the chatbot to use. This stage handles varied content formats, including abstract syntax trees, HTML, and Markdown. We stored this raw data in a pages collection in MongoDB.
Example pages command:

2. Chunk and Embed Content

An embed CLI command takes the data from the pages collection and transforms it into a form that the chatbot can use in addition to generating vector embeddings for the content. We stored the transformed content in the embedded_content collection, indexed using MongoDB Atlas Vector Search.
Example embed command:
To transform our pages documents into embedded_content documents, we used the following strategy: Break each page into one or more chunks using the LangChain RecursiveCharacterTextSplitter. We used the RecursiveCharacterTextSplitter to split the text into logical chunks, such as by keeping page sections (as denoted by headers) and code examples together. Allow max chunk size of 650 tokens. This led to an average chunk size of 450 tokens, which aligns with emerging best practices. Remove all chunks that are less than 15 tokens in length. These would sometimes show up in vector search results because they'd closely match the user query even though they provided little value for informing the answer generated by the ChatGPT API. Add metadata to the beginning of each chunk before creating the embedding. This gives the chunk greater semantic meaning to create the embedding with. See the following section for more information about how adding metadata greatly improved the quality of our vector search results.
Add chunk metadata
The most important improvement that we made to the chunking and embedding was to prepend chunks with metadata. For example, say you have this chunk of text about using MongoDB Atlas Vector Search:
This chunk itself has relevant information about performing a semantic search on Atlas data, but it lacks context data that makes it more likely to be found in the search results.
Before creating the vector embedding for the content, we add metadata to the top of the chunk to change it to:
Adding this metadata to the chunk greatly improved the quality of our search results, especially when combined with adding metadata to the user's query on the server before using it in vector search, as discussed in the “Chat Server” section.

Example document from

embedded_content

collection

Here’s an example document from the embedded_content collection. The embedding field is indexed with MongoDB Atlas Vector Search.

Data ingestion flow diagram

Ingest data flow diagram

Chat server

We built an Express.js server to coordinate RAG between the user, MongoDB documentation, and ChatGPT API. We used MongoDB Atlas Vector Search to perform a vector search on the ingested content in the embedded_content collection. We persist conversation information, including user and chatbot messages, to a conversations collection in the same MongoDB database.
The Express.js server is a fairly straightforward RESTful API with three routes:
  • POST /conversations: Create a new conversation.
  • POST /conversations/:conversationId/messages: Add a user message to a conversation and get back a RAG response to the user message. This route has the optional parameter stream to stream back a response or send it as a JSON object.
  • POST /conversations/:conversationId/messages/:messageId/rating: Rate a message.
Most of the complexity of the server was in the POST /conversations/:conversationId/messages route, as this handles the whole RAG flow.
We were able to make dramatic improvements over our initial naive RAG implementation by adding what we call a query preprocessor.

The query preprocessor

A query preprocessor mutates the original user query to something that is more conversationally relevant and gets better vector search results.
For example, say the user inputs the following query to the chatbot:
On its own, this query has little inherent semantic meaning and doesn't present a clear question for the ChatGPT API to answer.
However, using a query preprocessor, we transform this query into:
The application server then sends this transformed query in MongoDB Atlas Vector Search. It yields much better search results than the original query. The search query has more semantic meaning itself and also aligns with the metadata that we prepend during content ingestion to create a higher degree of semantic similarity for vector search.
Adding the programmingLanguage and mongoDbProducts information to the query focuses the vector search to create a response grounded in a specific subset of the total surface area of the MongoDB product suite. For example, here we would not want the chatbot to return results for using the PHP driver to perform $filter aggregations, but vector search would be more likely to return that if we didn't specify that we're looking for examples that use the shell.
Also, telling the ChatGPT API to answer the question "What is the syntax for filtering data in MongoDB?" provides a clearer answer than telling it to answer the original "$filter".
To create a preprocessor that transforms the query like this, we used the library TypeChat. TypeChat takes a string input and transforms it into a JSON object using the ChatGPT API. TypeChat uses TypeScript types to describe the shape of the output data.
The TypeScript type that we use in our application is as follows:
In our app, TypeChat uses the MongoDbUserQueryPreprocessorResponse schema and description to create an object structured on this schema.
Then, using a simple JavaScript function, we transform the MongoDbUserQueryPreprocessorResponse object into a query to send to embed and then send to MongoDB Atlas Vector Search.
We also have the rejectQuery field to flag if a query is inappropriate. When the rejectQuery: true, the server returns a static response to the user, asking them to try a different query.

Chat server flow diagram

Chat data flow diagram

React component UI

Our front end is a React component built with the LeafyGreen Design System. The component regulates the interaction with the chat server's RESTful API.
Currently, the component is only on the MongoDB docs homepage, but we built it in a way that it could be extended to be used on other MongoDB properties.
You can actually download the UI from npm with the mongodb-chatbot-ui package.
Here you can see what the chatbot looks like in action:
Chat UI

MongoDB for RAG applications

Building the chatbot on MongoDB Atlas was a great accelerant for our developer productivity and helped us simplify our infrastructure.
Setting up MongoDB Atlas Vector Search on our cluster took just a few clicks in the UI and adding the following Atlas Vector Search index to the embedding field of the embedded_content collection:
To run queries using the MongoDB Atlas Vector Search index, it's a simple aggregation operation with the $vectorSearch operator using the Node.js driver:
Using MongoDB to store the conversations data simplified the development experience, as we did not have to think about using a data store for the embeddings that is separate from the rest of the application data.
Using MongoDB Atlas for vector search and as our application data store streamlined our application development process so that we were able to focus on the core RAG application logic, and not have to think very much about managing additional infrastructure or learning new domain-specific query languages.

What we learned building a production RAG application

The MongoDB documentation AI chatbot has now been live for over a month and works pretty well (try it out!). It's still under active development, and we're going to roll it to other locations in the MongoDB product suite over the coming months.
Here are a couple of our key learnings from taking the chatbot to production:
  • Naive RAG is not enough. However, starting with a naive RAG prototype is a great way for you to figure out how you need to extend RAG to meet the needs of your use case.
  • Red teaming is incredibly useful for identifying issues. Red team early in the RAG application development process, and red team often.
  • Add metadata to the content before creating embeddings to improve search quality.
  • Preprocess user queries with an LLM (like the ChatGPT API and TypeChat) before sending them to vector search and having the LLM respond to the user. The preprocessor should:
    • Make the query more conversationally and semantically relevant.
    • Include metadata to use in vector search.
    • Catch any scenarios, like inappropriate queries, that you want to handle outside the normal RAG flow.
  • MongoDB Atlas is a great database for building production RAG apps.

Build your own production-ready RAG application with MongoDB

Want to build your own RAG application? We've made our source code publicly available as a reference architecture. Check it out on GitHub.
We're also working on releasing an open-source framework to simplify the creation of RAG applications using MongoDB. Stay tuned for more updates on this RAG framework.
Questions? Comments? Join us in the MongoDB Developer Community forum.
Top Comments in Forums
Forum Commenter Avatar
Leo_CrownLeo Crown2 months ago

Hi,

I’m curious about how you handle updating embeddings. Do you replace them entirely when content changes? If so, how do you manage the cost impact of doing so?

Additionally, what criteria do you use to determine when an update is necessary? For example, would a minor typo warrant an update, or are there more significant changes that trigger this process?

Thanks for sharing your insights with us!


Does this look good to you?


Forum Commenter Avatar
Patrick_ColemanPatrick Coleman2 months ago

Amazing article.

I’m wondering, how do you handle updating embeddings? Do you replace them entirely when content changes, and if so, how do you handle the cost impact of doing so?

In addition, what are some criteria(s) you were able to identify that would cause an update to be done?(e.g: if a typo was made, maybe it’s not worth updating)

Thanks for sharing this with us.

See More on Forums

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Is it Safe to Go Outside? Data Investigation With MongoDB


Sep 23, 2022 | 11 min read
Tutorial

Build a CRUD API With MongoDB, Typescript, Express, Prisma, and Zod


Sep 04, 2024 | 10 min read
Tutorial

How to Deploy MongoDB Atlas with AWS CloudFormation


Mar 20, 2024 | 6 min read
Article

Introducing PSC Interconnect and Global Access for MongoDB Atlas


Aug 05, 2024 | 2 min read
Table of Contents