Building Continuously Updating RAG Applications

Use MongoDB Atlas Stream Processing and Vector Search to continuously update, store, and search embeddings through a unified interface.

Use cases: Gen AI

Industries: Finance, Healthcare, Retail

Products: MongoDB Atlas, MongoDB Atlas Vector Search, MongoDB Atlas Stream Processing

Partners: Confluent, AWS

Solution Overview

Providing models with up-to-date data is essential so that they can use relevant context when offering recommendations beyond a one-size-fits-all AI approach. Retrieval-augmented generation (RAG) systems enable organizations to ground large language models (LLMs) and other foundational models in the truth of their proprietary data. However, maintaining the underlying data is complex. To ensure that models provide accurate answers, it is essential to continuously update the vector embeddings that form the core of RAG systems to represent the latest information available.

Furthermore, the choice of embedding model impacts the quality of AI outputs because different models are optimized for different purposes and data types. For example, an embedding model trained on a particular language will create more contextually appropriate embeddings for that language than a general-purpose model trained across many languages.

By leveraging MongoDB Atlas' native Stream Processing and Vector Search capabilities, this solution allows developers to continuously update, store, and search embeddings within a single interface.

This solution is relevant to many industries and use cases, including:

Financial services: Financial documents, legal policies, and contracts often use multiple languages and differ based on country regulations. Empowering loan officers with an AI-powered interface using relevant and fresh data for expediting loan creation can optimize banking workflows.
Healthcare and Insurance: RAG systems that help update patient records or underwrite insurance policies need access to up-to-date information.
Retail: Up-to-date contextual data is crucial for RAG systems to select the right embedding model, enabling personalized experiences for customers regardless of the language they use.

Reference Architectures

This solution uses the following components:

MongoDB Atlas Cluster: Enables the flexible storage of various data types including text, associated metadata, and corresponding vector embeddings in documents. Vector index in Atlas directly supports efficient semantic search queries within the database, which you can use with the MongoDB Aggregation Framework.
Confluent Kafka Cluster: Receives document updates and new documents from producers and makes them available for further processing by Atlas Stream Processing.
Atlas Stream Processing: Subscribes to the event streams generated by MongoDB, filters relevant information, transforms events, and emits them to the corresponding Kafka topic. It also subscribes to the Kafka cluster to process updates and propagate changes back to the database.
Metadata Service:
- Embedding Generator: Python script that subscribes to the Kafka input topics. For each message received, it generates an embedding using a specialized machine learning model.
- Tags Extractor: Python script that analyzes incoming data to identify relevant structured metadata to enrich the document for indexing, search, or analysis.

click to enlarge

Figure 1. Scalable vector updates reference architecture with MongoDB

Data Model Approach

In the demo solution, the data model is a collection of documents that encapsulate all relevant information about a song. MongoDB's document data model stores diverse data types alongside their embeddings, allowing for easy and fast data retrieval.

The sample data has two datasets available for import: archive_lyrics_small1 and archive_lyrics_small2. The documents in these datasets have the following structure:

{
   "title": "Hurricane",
   "artist": "Bob Dylan",
   "year": 1976,
   "lyrics": "...",
   "language": "en",
   "genre": "rock",
   "duration": 61,
   "lyrics_embeddings_en": [...],
   "tags": ["man", "story", "night"]   // only in archive_lyrics_small1
}

In this solution, Atlas Stream Processing uses the following data fields for the output topic:

lyrics_embeddings_en/lyrics_embeddings_es: Language-specific lyrics embedding vector
tags: Only in the archive_lyrics_small1 dataset, lists frequently-occurring words in the lyrics

Build the Solution

The GitHub repository contains detailed instructions for replicating this solution, allowing you to update your embeddings asynchronously and at scale with MongoDB Atlas.

The README guides you through the following steps:

Set up the Environment

Clone the repository, set up a virtual environment, and install necessary dependencies.

Load the Dataset

Important

If you don't already have an Atlas account, join now and create a cluster.

Use the provided script to load the data with mongoimport.

Configure a Kafka Cluster in Confluent

Follow the instructions in the Confluent documentation to create a Kafka Cluster.

Copy your bootstrap URL from the Cluster Settings tab on Confluent and use the Kafka REST API to create an API key to connect to your cluster.

Create the topics SpanishInputTopic, EnglishInputTopic, and OutputTopic in the Topics tab on Confluent.

Configure the Stream Processing Connection Registry

Use the Confluent bootstrap URL in the connection registry to configure a new connection between the Atlas Stream Processing Instance and the Kafka Cluster.

Connect the Atlas Stream Processing Instance to the Atlas cluster.

Configure Atlas Stream Processing

Copy your connection string for connecting to the Stream Processing Instance.

Use the MongoDB Shell (mongosh) to configure the pipelines and connections in the Stream Processing Instance.

Launch the Processor Scripts

Execute the metadata service to subscribe to the input topics, create the tags and embeddings for the corresponding language according to the information received in the event, and write the event to the output topic.

Create an Atlas Vector Search Index

Create and configure an Atlas Vector Search index for lyrics_embeddings_es. You must structure the search index as follows:

{
   "fields": [
      {
         "type": "vector",
         "path": "lyrics_embeddings_es",
         "numDimensions": 768,
         "similarity": "cosine"
      }
   ]
}

Create and configure an Atlas Vector Search index for lyrics_embeddings_en. You must structure the search index as follows:

{
   "fields": [
      {
         "type": "vector",
         "path": "lyrics_embeddings_en",
         "numDimensions": 384,
         "similarity": "cosine"
      }
   ]
}

Search and Analyze Large-Scale Documents using Vector Search

Use the provided query_client.py script to run semantic queries using Atlas Vector Search in a chat interface.

Key Learnings

Maintain embedding relevance: Regularly update data embeddings to ensure your semantic searches remain accurate.
Optimize language-model pairing: Ensure that your LLM closely aligns with the language of your data to enhance the relevance and precision of your search results.
Embrace flexible embeddings: MongoDB's flexible data model allows you to store embeddings directly alongside your data, regardless of their length or the model used to generate them.
Choose the right similarity function: The effectiveness of your semantic searches depends on the chosen similarity function. Tailor your selection to your specific use case.
Generate asynchronous embeddings: Create embeddings asynchronously to maintain your application performance and scale generation functions horizontally.

Authors

David Sanchez, MongoDB

Learn More

Back

Optimize RAG Applications with Fireworks AI

Payments Modernization Accelerator