Docs Menu
Docs Home
/
MongoDB Atlas
/

Retrieval-Augmented Generation (RAG) with Atlas Vector Search

On this page

  • Why use RAG?
  • RAG with Atlas Vector Search
  • Ingestion
  • Retrieval
  • Generation
  • Get Started
  • Prerequisites
  • Procedure
  • Next Steps
  • Fine-Tuning

Retrieval-augmented generation (RAG) is an architecture used to augment large language models (LLMs) with additional data so that they can generate more accurate responses. You can implement RAG in your generative AI applications by combining an LLM with a retrieval system powered by Atlas Vector Search.

Get Started

When working with LLMs, you might encounter the following limitations:

  • Stale data: LLMs are trained on a static dataset up to a certain point in time. This means that they have a limited knowledge base and might use outdated data.

  • No access to local data: LLMs don't have access to local or personalized data. Therefore, they can lack knowledge about specific domains.

  • Hallucinations: When training data is incomplete or outdated, LLMs can generate inaccurate information.

You can address these limitations by taking the following steps to implement RAG:

  1. Ingestion: Store your custom data as vector embeddings in a vector database, such as MongoDB Atlas. This allows you to create a knowledge base of up-to-date and personalized data.

  2. Retrieval: Retrieve semantically similar documents from the database based on the user's question by using a search solution, such as Atlas Vector Search. These documents augment the LLM with additional, relevant data.

  3. Generation: Prompt the LLM. The LLM uses the retrieved documents as context to generate a more accurate and relevant response, reducing hallucinations.

Because RAG enables tasks such as question answering and text generation, it's an effective architecture for building AI chatbots that provide personalized, domain-specific responses. To create production-ready chatbots, you must configure a server to route requests and build a user interface on top of your RAG implementation.

To implement RAG with Atlas Vector Search, you ingest data into Atlas, retrieve documents with Atlas Vector Search, and generate responses using an LLM. This section describes the components of a basic, or naive, RAG implementation with Atlas Vector Search. For step-by-step instructions, see Get Started.

RAG flowchart with Atlas Vector Search
click to enlarge

Data ingestion for RAG involves processing your custom data and storing it in a vector database to prepare it for retrieval. To create a basic ingestion pipeline with Atlas as the vector database, do the following:

  1. Load the data.

    Use tools like document loaders to load data from different data formats and locations.

  2. Split the data into chunks.

    Process, or chunk, your data. Chunking involves splitting your data into smaller parts to improve performance.

  3. Convert the data to vector embeddings.

    Convert your data into vector embeddings by using an embedding model. To learn more, see How to Create Vector Embeddings.

  4. Store the data and embeddings in Atlas.

    Store these embeddings in Atlas. You store embeddings as a field alongside other data in your collection.

  1. Load the data.

    Use tools like document loaders or data connectors to load data from different data formats and locations.

  2. Split the data into chunks.

    Process, or chunk, your data. Chunking involves splitting your data into smaller parts to improve performance.

  3. Convert the data to vector embeddings.

    Convert your data into vector embeddings by using an embedding model. To learn more, see How to Create Vector Embeddings.

  4. Store the data and embeddings in Atlas.

    Store these embeddings in Atlas. You store embeddings as a field alongside other data in your collection.

  1. Load the data.

    Use tools like document loaders or data connectors to load data from different data formats and locations.

  2. Split the data into chunks.

    Process, or chunk, your data. Chunking involves splitting your data into smaller parts to improve performance.

  3. Convert the data to vector embeddings.

    Convert your data into vector embeddings by using an embedding model. To learn more, see How to Create Vector Embeddings.

  4. Store the data and embeddings in Atlas.

    Store these embeddings in Atlas. You store embeddings as a field alongside other data in your collection.

Building a retrieval system involves searching for and returning the most relevant documents from your vector database to augment the LLM with. To retrieve relevant documents with Atlas Vector Search, you convert the user's question into vector embeddings and run a vector search query against your data in Atlas to find documents with the most similar embeddings.

To perform basic retrieval with Atlas Vector Search, do the following:

  1. Define an Atlas Vector Search index on the collection that contains your vector embeddings.

  2. Choose one of the following methods to retrieve documents based on the user's question:

    • Use an Atlas Vector Search integration with a popular framework or service. These integrations include built-in libraries and tools that enable you to easily build retrieval systems with Atlas Vector Search.

    • Build your own retrieval system. You can define your own functions and pipelines to run Atlas Vector Search queries specific to your use case.

To generate responses, combine your retrieval system with an LLM. After you perform a vector search to retrieve relevant documents, you provide the user's question along with the relevant documents as context to the LLM so that it can generate a more accurate response.

Choose one of the following methods to connect to an LLM:

  • Use an Atlas Vector Search integration with a popular framework or service. These integrations include built-in libraries and tools to help you connect to LLMs with minimal set-up.

  • Call the LLM's API. Most AI providers offer APIs to their generative models that you can use to generate responses.

  • Load an open-source LLM. If you don't have API keys or credits, you can use an open-source LLM by loading it locally from your application.

The following example demonstrates a basic RAG implementation with Atlas Vector Search by using the MongoDB LangChain integration and Hugging Face to easily load and access embedding and generative models.


➤ Use the Select your language drop-down menu to set the language of the examples on this page.


To complete this example, you must have the following:

  • An Atlas account with a cluster running MongoDB version 6.0.11 or 7.0.2 and later. To learn more, see Create a Cluster.

  • A Hugging Face Access Token with read access.

  • An environment to run interactive Python notebooks such as Colab.

    Note

    If you're using Colab, ensure that your notebook session's IP address is included in your Atlas project's access list.

1
  1. Initialize your Go project.

    Run the following commands in your terminal to create a new directory named rag-mongodb and initialize your project:

    mkdir rag-mongodb
    cd rag-mongodb
    go mod init rag-mongodb
  2. Install and import dependencies.

    Run the following commands:

    go get github.com/joho/godotenv
    go get go.mongodb.org/mongo-driver/mongo
    go get github.com/tmc/langchaingo/llms
    go get github.com/tmc/langchaingo/documentloaders
    go get github.com/tmc/langchaingo/embeddings/huggingface
    go get github.com/tmc/langchaingo/llms/huggingface
    go get github.com/tmc/langchaingo/prompts
  3. Create a .env file.

    In your project, create a .env file to store your Atlas connection string and Hugging Face access token.

    .env
    HUGGINGFACEHUB_API_TOKEN = "<access-token>"
    ATLAS_CONNECTION_STRING = "<connection-string>"

    Replace the <access-token> and <connection-string> placeholder values with your Hugging Face access token and the SRV connection string for your Atlas cluster. Your connection string should use the following format:

    mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
2

In this section, you create a function that:

  • Loads the mxbai-embed-large-v1 embedding model from Hugging Face's model hub.

  • Creates vector embeddings from the inputted data.

  1. Run the following command to create a directory that stores common functions, including one that you'll reuse to create embeddings.

    mkdir common && cd common
  2. Create a file called get-embeddings.go in the common directory, and paste the following code into it:

    get-embeddings.go
    package common
    import (
    "context"
    "log"
    "github.com/tmc/langchaingo/embeddings/huggingface"
    )
    func GetEmbeddings(documents []string) [][]float32 {
    hf, err := huggingface.NewHuggingface(
    huggingface.WithModel("mixedbread-ai/mxbai-embed-large-v1"),
    huggingface.WithTask("feature-extraction"))
    if err != nil {
    log.Fatalf("failed to connect to Hugging Face: %v", err)
    }
    embs, err := hf.EmbedDocuments(context.Background(), documents)
    if err != nil {
    log.Fatalf("failed to generate embeddings: %v", err)
    }
    return embs
    }
3

In this section, you ingest sample data into Atlas that LLMs don't have access to. The following code uses the Go library for LangChain and Go driver to do the following:

  • Create a HTML file that contains a MongoDB earnings report.

  • Split the data into chunks, specifying the chunk size (number of characters) and chunk overlap (number of overlapping characters between consecutive chunks).

  • Create vector embeddings from the chunked data by using the GetEmbeddings function that you defined.

  • Store these embeddings alongside the chunked data in the rag_db.test collection in your Atlas cluster.

  1. Navigate to the root of the rag-mongodb project directory.

  2. Create a file called ingest-data.go in your project, and paste the following code into it:

    ingest-data.go
    package main
    import (
    "context"
    "fmt"
    "io"
    "log"
    "net/http"
    "os"
    "rag-mongodb/common" // Module that contains the embedding function
    "github.com/joho/godotenv"
    "github.com/tmc/langchaingo/documentloaders"
    "github.com/tmc/langchaingo/textsplitter"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
    )
    type DocumentToInsert struct {
    PageContent string `bson:"pageContent"`
    Embedding []float32 `bson:"embedding"`
    }
    func downloadReport(filename string) {
    _, err := os.Stat(filename)
    if err == nil {
    return
    }
    url := "https://investors.mongodb.com/node/12236"
    fmt.Println("Downloading ", url, " to ", filename)
    resp, err := http.Get(url)
    if err != nil {
    log.Fatalf("failed to connect to download the report: %v", err)
    }
    defer func() { _ = resp.Body.Close() }()
    f, err := os.Create(filename)
    if err != nil {
    return
    }
    defer func() { _ = f.Close() }()
    _, err = io.Copy(f, resp.Body)
    if err != nil {
    log.Fatalf("failed to copy the report: %v", err)
    }
    }
    func main() {
    ctx := context.Background()
    filename := "investor-report.html"
    downloadReport(filename)
    f, err := os.Open(filename)
    if err != nil {
    defer func() { _ = f.Close() }()
    log.Fatalf("failed to open the report: %v", err)
    }
    defer func() { _ = f.Close() }()
    html := documentloaders.NewHTML(f)
    split := textsplitter.NewRecursiveCharacter()
    split.ChunkSize = 400
    split.ChunkOverlap = 20
    docs, err := html.LoadAndSplit(context.Background(), split)
    if err != nil {
    log.Fatalf("failed to chunk the HTML into documents: %v", err)
    }
    fmt.Printf("Successfully chunked the HTML into %v documents.\n", len(docs))
    if err := godotenv.Load(); err != nil {
    log.Fatal("no .env file found")
    }
    // Connect to your Atlas cluster
    uri := os.Getenv("ATLAS_CONNECTION_STRING")
    if uri == "" {
    log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
    }
    clientOptions := options.Client().ApplyURI(uri)
    client, err := mongo.Connect(ctx, clientOptions)
    if err != nil {
    log.Fatalf("failed to connect to the server: %v", err)
    }
    defer func() { _ = client.Disconnect(ctx) }()
    // Set the namespace
    coll := client.Database("rag_db").Collection("test")
    fmt.Println("Generating embeddings.")
    var pageContents []string
    for i := range docs {
    pageContents = append(pageContents, docs[i].PageContent)
    }
    embeddings := common.GetEmbeddings(pageContents)
    docsToInsert := make([]interface{}, len(embeddings))
    for i := range embeddings {
    docsToInsert[i] = DocumentToInsert{
    PageContent: pageContents[i],
    Embedding: embeddings[i],
    }
    }
    result, err := coll.InsertMany(ctx, docsToInsert)
    if err != nil {
    log.Fatalf("failed to insert documents: %v", err)
    }
    fmt.Printf("Successfully inserted %v documents into Atlas\n", len(result.InsertedIDs))
    }
  3. Run the following command to execute the code:

    go run ingest-data.go
    Successfully chunked the HTML into 163 documents.
    Generating embeddings.
    Successfully inserted document with id: &{ObjectID("66faffcd60da3f6d4f990fa4")}
    Successfully inserted document with id: &{ObjectID("66faffce60da3f6d4f990fa5")}
    ...
4

In this section, you set up Atlas Vector Search to retrieve documents from your vector database. Complete the following steps:

  1. Create an Atlas Vector Search index on your vector embeddings.

    Create a new file named rag-vector-index.go and paste the following code. This code connects to your Atlas cluster and creates an index of the vectorSearch type on the rag_db.test collection.

    rag-vector-index.go
    package main
    import (
    "context"
    "log"
    "os"
    "time"
    "go.mongodb.org/mongo-driver/bson"
    "github.com/joho/godotenv"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
    )
    func main() {
    ctx := context.Background()
    if err := godotenv.Load(); err != nil {
    log.Fatal("no .env file found")
    }
    // Connect to your Atlas cluster
    uri := os.Getenv("ATLAS_CONNECTION_STRING")
    if uri == "" {
    log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
    }
    clientOptions := options.Client().ApplyURI(uri)
    client, err := mongo.Connect(ctx, clientOptions)
    if err != nil {
    log.Fatalf("failed to connect to the server: %v", err)
    }
    defer func() { _ = client.Disconnect(ctx) }()
    // Specify the database and collection
    coll := client.Database("rag_db").Collection("test")
    indexName := "vector_index"
    opts := options.SearchIndexes().SetName(indexName).SetType("vectorSearch")
    type vectorDefinitionField struct {
    Type string `bson:"type"`
    Path string `bson:"path"`
    NumDimensions int `bson:"numDimensions"`
    Similarity string `bson:"similarity"`
    }
    type filterField struct {
    Type string `bson:"type"`
    Path string `bson:"path"`
    }
    type vectorDefinition struct {
    Fields []vectorDefinitionField `bson:"fields"`
    }
    indexModel := mongo.SearchIndexModel{
    Definition: vectorDefinition{
    Fields: []vectorDefinitionField{{
    Type: "vector",
    Path: "embedding",
    NumDimensions: 1024,
    Similarity: "cosine"}},
    },
    Options: opts,
    }
    log.Println("Creating the index.")
    searchIndexName, err := coll.SearchIndexes().CreateOne(ctx, indexModel)
    if err != nil {
    log.Fatalf("failed to create the search index: %v", err)
    }
    // Await the creation of the index.
    log.Println("Polling to confirm successful index creation.")
    log.Println("NOTE: This may take up to a minute.")
    searchIndexes := coll.SearchIndexes()
    var doc bson.Raw
    for doc == nil {
    cursor, err := searchIndexes.List(ctx, options.SearchIndexes().SetName(searchIndexName))
    if err != nil {
    log.Printf("failed to list search indexes: %w", err)
    }
    if !cursor.Next(ctx) {
    break
    }
    name := cursor.Current.Lookup("name").StringValue()
    queryable := cursor.Current.Lookup("queryable").Boolean()
    if name == searchIndexName && queryable {
    doc = cursor.Current
    } else {
    time.Sleep(5 * time.Second)
    }
    }
    log.Println("Name of Index Created: " + searchIndexName)
    }
  2. Run the following command to create the index:

    go run rag-vector-index.go
  3. Define a function to retrieve relevant data.

    In this step, you create a retrieval function called GetQueryResults that runs a query to retrieve relevant documents. It uses the GetEmbeddings function to create embeddings from the search query. Then, it runs the query to return semantically-similar documents.

    To learn more, refer to Run Vector Search Queries.

    In the common directory, create a new file called get-query-results.go, and paste the following code into it:

    get-query-results.go
    package common
    import (
    "context"
    "log"
    "os"
    "github.com/joho/godotenv"
    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
    )
    type TextWithScore struct {
    PageContent string `bson:"pageContent"`
    Score float64 `bson:"score"`
    }
    func GetQueryResults(query string) []TextWithScore {
    ctx := context.Background()
    if err := godotenv.Load(); err != nil {
    log.Fatal("no .env file found")
    }
    // Connect to your Atlas cluster
    uri := os.Getenv("ATLAS_CONNECTION_STRING")
    if uri == "" {
    log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
    }
    clientOptions := options.Client().ApplyURI(uri)
    client, err := mongo.Connect(ctx, clientOptions)
    if err != nil {
    log.Fatalf("failed to connect to the server: %v", err)
    }
    defer func() { _ = client.Disconnect(ctx) }()
    // Specify the database and collection
    coll := client.Database("rag_db").Collection("test")
    queryEmbedding := GetEmbeddings([]string{query})
    vectorSearchStage := bson.D{
    {"$vectorSearch", bson.D{
    {"index", "vector_index"},
    {"path", "embedding"},
    {"queryVector", queryEmbedding[0]},
    {"exact", true},
    {"limit", 5},
    }}}
    projectStage := bson.D{
    {"$project", bson.D{
    {"_id", 0},
    {"pageContent", 1},
    {"score", bson.D{{"$meta", "vectorSearchScore"}}},
    }}}
    cursor, err := coll.Aggregate(ctx, mongo.Pipeline{vectorSearchStage, projectStage})
    if err != nil {
    log.Fatalf("failed to execute the aggregation pipeline: %v", err)
    }
    var results []TextWithScore
    if err = cursor.All(context.TODO(), &results); err != nil {
    log.Fatalf("failed to connect unmarshal retrieved documents: %v", err)
    }
    return results
    }
  4. Test retrieving the data.

    1. In the rag-mongodb project directory, create a new file called retrieve-documents-test.go. In this step, you check that the function you just defined returns relevant results.

    2. Paste this code into your file:

      retrieve-documents-test.go
      package main
      import (
      "fmt"
      "rag-mongodb/common" // Module that contains the GetQueryResults function
      )
      func main() {
      query := "AI Technology"
      documents := common.GetQueryResults(query)
      for _, doc := range documents {
      fmt.Printf("Text: %s \nScore: %v \n\n", doc.PageContent, doc.Score)
      }
      }
    3. Run the following command to execute the code:

      go run retrieve-documents-test.go
      Text: for the variety and scale of data required by AI-powered applications. We are confident MongoDB will be a substantial beneficiary of this next wave of application development.&#34;
      Score: 0.835033655166626
      Text: &#34;As we look ahead, we continue to be incredibly excited by our large market opportunity, the potential to increase share, and become a standard within more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these applications. MongoDB&#39;s document-based architecture is particularly well-suited for the variety and
      Score: 0.8280757665634155
      Text: to the use of new and evolving technologies, such as artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that market; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to maintain the security of our software
      Score: 0.8165900111198425
      Text: MongoDB continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP), which provides customers with reference architectures, pre-built partner integrations, and professional services to help them quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB projects, and is the first global systems
      Score: 0.8023912906646729
      Text: Bendigo and Adelaide Bank partnered with MongoDB to modernize their core banking technology. With the help of MongoDB Relational Migrator and generative AI-powered modernization tools, Bendigo and Adelaide Bank decomposed an outdated consumer-servicing application into microservices and migrated off its underlying legacy relational database technology significantly faster and more easily than
      Score: 0.7959681749343872
5

In this section, you generate responses by prompting an LLM to use the retrieved documents as context. This example uses the function you just defined to retrieve matching documents from the database, and additionally:

  • Accesses the Mistral 7B Instruct model from Hugging Face's model hub.

  • Instructs the LLM to include the user's question and retrieved documents in the prompt.

  • Prompts the LLM about MongoDB's latest AI announcements.

  1. Create a new file called generate-responses.go, and paste the following code into it:

    generate-responses.go
    package main
    import (
    "context"
    "fmt"
    "log"
    "rag-mongodb/common" // Module that contains the GetQueryResults function
    "strings"
    "github.com/tmc/langchaingo/llms"
    "github.com/tmc/langchaingo/llms/huggingface"
    "github.com/tmc/langchaingo/prompts"
    )
    func main() {
    ctx := context.Background()
    query := "AI Technology"
    documents := common.GetQueryResults(query)
    var textDocuments strings.Builder
    for _, doc := range documents {
    textDocuments.WriteString(doc.PageContent)
    }
    question := "In a few sentences, what are MongoDB's latest AI announcements?"
    template := prompts.NewPromptTemplate(
    `Answer the following question based on the given context.
    Question: {{.question}}
    Context: {{.context}}`,
    []string{"question", "context"},
    )
    prompt, err := template.Format(map[string]any{
    "question": question,
    "context": textDocuments.String(),
    })
    opts := llms.CallOptions{
    Model: "mistralai/Mistral-7B-Instruct-v0.3",
    MaxTokens: 150,
    Temperature: 0.1,
    }
    llm, err := huggingface.New(huggingface.WithModel("mistralai/Mistral-7B-Instruct-v0.3"))
    if err != nil {
    log.Fatalf("failed to initialize a Hugging Face LLM: %v", err)
    }
    completion, err := llms.GenerateFromSinglePrompt(ctx, llm, prompt, llms.WithOptions(opts))
    if err != nil {
    log.Fatalf("failed to generate a response from the prompt: %v", err)
    }
    response := strings.Split(completion, "\n\n")
    if len(response) == 2 {
    fmt.Printf("Prompt: %v\n\n", response[0])
    fmt.Printf("Response: %v\n", response[1])
    }
    }
  2. Run this command to execute the code. The generated response might vary.

    go run generate-responses.go
    Prompt: Answer the following question based on the given context.
    Question: In a few sentences, what are MongoDB's latest AI announcements?
    Context: for the variety and scale of data required by AI-powered applications. We are confident MongoDB will be a substantial beneficiary of this next wave of application development.&#34;&#34;As we look ahead, we continue to be incredibly excited by our large market opportunity, the potential to increase share, and become a standard within more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these applications. MongoDB&#39;s document-based architecture is particularly well-suited for the variety andto the use of new and evolving technologies, such as artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that market; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to maintain the security of our softwareMongoDB continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP), which provides customers with reference architectures, pre-built partner integrations, and professional services to help them quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB projects, and is the first global systemsBendigo and Adelaide Bank partnered with MongoDB to modernize their core banking technology. With the help of MongoDB Relational Migrator and generative AI-powered modernization tools, Bendigo and Adelaide Bank decomposed an outdated consumer-servicing application into microservices and migrated off its underlying legacy relational database technology significantly faster and more easily than expected.
    Response: MongoDB's latest AI announcements include the launch of the MongoDB AI Applications Program (MAAP) and a partnership with Accenture to establish a center of excellence focused on MongoDB projects. Additionally, Bendigo and Adelaide Bank have partnered with MongoDB to modernize their core banking technology using MongoDB's AI-powered modernization tools.
1
  1. Initialize your Node.js project.

    Run the following commands in your terminal to create a new directory named rag-mongodb and initialize your project:

    mkdir rag-mongodb
    cd rag-mongodb
    npm init -y
  2. Install and import dependencies.

    Run the following command:

    npm install mongodb langchain @langchain/community @xenova/transformers @huggingface/inference pdf-parse
  3. Update your package.json file.

    In your project's package.json file, specify the type field as shown in the following example, and then save the file.

    {
    "name": "rag-mongodb",
    "type": "module",
    ...
  4. Create a .env file.

    In your project, create a .env file to store your Atlas connection string and Hugging Face access token.

    HUGGING_FACE_ACCESS_TOKEN = "<access-token>"
    ATLAS_CONNECTION_STRING = "<connection-string>"

    Replace the <access-token> and <connection-string> placeholder values with your Hugging Face access token and the SRV connection string for your Atlas cluster. Your connection string should use the following format:

    mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

    Note

    Minimum Node.js Version Requirements

    Node.js v20.x introduced the --env-file option. If you are using an older version of Node.js, add the dotenv package to your project, or use a different method to manage your environment variables.

2

In this section, you create a function that:

  • Loads the nomic-embed-text-v1 embedding model from Hugging Face's model hub.

  • Creates vector embeddings from the inputted data.

Create a file called get-embeddings.js in your project, and paste the following code:

import { pipeline } from '@xenova/transformers';
// Function to generate embeddings for a given data source
export async function getEmbedding(data) {
const embedder = await pipeline(
'feature-extraction',
'Xenova/nomic-embed-text-v1');
const results = await embedder(data, { pooling: 'mean', normalize: true });
return Array.from(results.data);
}
3

In this section, you ingest sample data into Atlas that LLMs don't have access to. The following code uses the LangChain integration and Node.js driver to do the following:

  • Load a PDF that contains a MongoDB earnings report.

  • Split the data into chunks, specifying the chunk size (number of characters) and chunk overlap (number of overlapping characters between consecutive chunks).

  • Create vector embeddings from the chunked data by using the getEmbeddings function that you defined.

  • Store these embeddings alongside the chunked data in the rag_db.test collection in your Atlas cluster.

Create a file called ingest-data.js in your project, and paste the following code:

import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MongoClient } from 'mongodb';
import { getEmbeddings } from './get-embeddings.js';
import * as fs from 'fs';
async function run() {
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
try {
// Save online PDF as a file
const rawData = await fetch("https://investors.mongodb.com/node/12236/pdf");
const pdfBuffer = await rawData.arrayBuffer();
const pdfData = Buffer.from(pdfBuffer);
fs.writeFileSync("investor-report.pdf", pdfData);
const loader = new PDFLoader(`investor-report.pdf`);
const data = await loader.load();
// Chunk the text from the PDF
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 400,
chunkOverlap: 20,
});
const docs = await textSplitter.splitDocuments(data);
console.log(`Successfully chunked the PDF into ${docs.length} documents.`);
// Connect to your Atlas cluster
await client.connect();
const db = client.db("rag_db");
const collection = db.collection("test");
console.log("Generating embeddings and inserting documents.");
let docCount = 0;
await Promise.all(docs.map(async doc => {
const embeddings = await getEmbeddings(doc.pageContent);
// Insert the embeddings and the chunked PDF data into Atlas
await collection.insertOne({
document: doc,
embedding: embeddings,
});
docCount += 1;
}))
console.log(`Successfully inserted ${docCount} documents.`);
} catch (err) {
console.log(err.stack);
}
finally {
await client.close();
}
}
run().catch(console.dir);

Then, run the following command to execute the code:

node --env-file=.env ingest-data.js

Tip

This code takes some time to run. You can view your vector embeddings as they're inserted by navigating to the rag_db.test collection in the Atlas UI.

4

In this section, you set up Atlas Vector Search to retrieve documents from your vector database. Complete the following steps:

  1. Create an Atlas Vector Search index on your vector embeddings.

    Create a new file named rag-vector-index.js and paste the following code. This code connects to your Atlas cluster and creates an index of the vectorSearch type on the rag_db.test collection.

    import { MongoClient } from 'mongodb';
    // Connect to your Atlas cluster
    const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
    async function run() {
    try {
    const database = client.db("rag_db");
    const collection = database.collection("test");
    // Define your Atlas Vector Search index
    const index = {
    name: "vector_index",
    type: "vectorSearch",
    definition: {
    "fields": [
    {
    "type": "vector",
    "numDimensions": 768,
    "path": "embedding",
    "similarity": "cosine"
    }
    ]
    }
    }
    // Call the method to create the index
    const result = await collection.createSearchIndex(index);
    console.log(result);
    } finally {
    await client.close();
    }
    }
    run().catch(console.dir);

    Then, run the following command to execute the code:

    node --env-file=.env rag-vector-index.js
  2. Define a function to retrieve relevant data.

    Create a new file called retrieve-documents.js.

    In this step, you create a retrieval function called getQueryResults that runs a query to retrieve relevant documents. It uses the getEmbeddings function to create embeddings from the search query. Then, it runs the query to return semantically-similar documents.

    To learn more, refer to Run Vector Search Queries.

    Paste this code into your file:

    import { MongoClient } from 'mongodb';
    import { getEmbeddings } from './get-embeddings.js';
    // Function to get the results of a vector query
    export async function getQueryResults(query) {
    // Connect to your Atlas cluster
    const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
    try {
    // Get embeddings for a query
    const queryEmbeddings = await getEmbeddings(query);
    await client.connect();
    const db = client.db("rag_db");
    const collection = db.collection("test");
    const pipeline = [
    {
    $vectorSearch: {
    index: "vector_index",
    queryVector: queryEmbeddings,
    path: "embedding",
    exact: true,
    limit: 5
    }
    },
    {
    $project: {
    _id: 0,
    document: 1,
    }
    }
    ];
    // Retrieve documents from Atlas using this Vector Search query
    const result = collection.aggregate(pipeline);
    const arrayOfQueryDocs = [];
    for await (const doc of result) {
    arrayOfQueryDocs.push(doc);
    }
    return arrayOfQueryDocs;
    } catch (err) {
    console.log(err.stack);
    }
    finally {
    await client.close();
    }
    }
  3. Test retrieving the data.

    Create a new file called retrieve-documents-test.js. In this step, you check that the function you just defined returns relevant results.

    Paste this code into your file:

    import { getQueryResults } from './retrieve-documents.js';
    async function run() {
    try {
    const query = "AI Technology";
    const documents = await getQueryResults(query);
    documents.forEach( doc => {
    console.log(doc);
    });
    } catch (err) {
    console.log(err.stack);
    }
    }
    run().catch(console.dir);

    Then, run the following command to execute the code:

    node --env-file=.env retrieve-documents-test.js
    {
    document: {
    pageContent: 'MongoDB continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP),',
    metadata: { source: 'investor-report.pdf', pdf: [Object], loc: [Object] },
    id: null
    }
    }
    {
    document: {
    pageContent: 'artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that\n' +
    'market; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to',
    metadata: { source: 'investor-report.pdf', pdf: [Object], loc: [Object] },
    id: null
    }
    }
    {
    document: {
    pageContent: 'more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these\n' +
    "applications. MongoDB's document-based architecture is particularly well-suited for the variety and scale of data required by AI-powered applications. \n" +
    'We are confident MongoDB will be a substantial beneficiary of this next wave of application development."',
    metadata: { source: 'investor-report.pdf', pdf: [Object], loc: [Object] },
    id: null
    }
    }
    {
    document: {
    pageContent: 'which provides customers with reference architectures, pre-built partner integrations, and professional services to help\n' +
    'them quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB projects,\n' +
    'and is the first global systems integrator to join MAAP.',
    metadata: { source: 'investor-report.pdf', pdf: [Object], loc: [Object] },
    id: null
    }
    }
    {
    document: {
    pageContent: 'Bendigo and Adelaide Bank partnered with MongoDB to modernize their core banking technology. With the help of\n' +
    'MongoDB Relational Migrator and generative AI-powered modernization tools, Bendigo and Adelaide Bank decomposed an\n' +
    'outdated consumer-servicing application into microservices and migrated off its underlying legacy relational database',
    metadata: { source: 'investor-report.pdf', pdf: [Object], loc: [Object] },
    id: null
    }
    }
5

In this section, you generate responses by prompting an LLM to use the retrieved documents as context. This example uses the function you just defined to retrieve matching documents from the database, and additionally:

  • Accesses the Mistral 7B Instruct model from Hugging Face's model hub.

  • Instructs the LLM to include the user's question and retrieved documents in the prompt.

  • Prompts the LLM about MongoDB's latest AI announcements.

Create a new file called generate-responses.js, and paste the following code into it:

import { getQueryResults } from './retrieve-documents.js';
import { HfInference } from '@huggingface/inference'
async function run() {
try {
// Specify search query and retrieve relevant documents
const query = "AI Technology";
const documents = await getQueryResults(query);
// Build a string representation of the retrieved documents to use in the prompt
let textDocuments = "";
documents.forEach(doc => {
textDocuments += doc.document.pageContent;
});
const question = "In a few sentences, what are MongoDB's latest AI announcements?";
// Create a prompt consisting of the question and context to pass to the LLM
const prompt = `Answer the following question based on the given context.
Question: {${question}}
Context: {${textDocuments}}
`;
// Connect to Hugging Face, using the access token from the environment file
const hf = new HfInference(process.env.HUGGING_FACE_ACCESS_TOKEN);
const llm = hf.endpoint(
"https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3"
);
// Prompt the LLM to answer the question using the
// retrieved documents as the context
const output = await llm.chatCompletion({
model: "mistralai/Mistral-7B-Instruct-v0.2",
messages: [{ role: "user", content: prompt }],
max_tokens: 150,
});
// Output the LLM's response as text.
console.log(output.choices[0].message.content);
} catch (err) {
console.log(err.stack);
}
}
run().catch(console.dir);

Then, run this command to execute the code. The generated response might vary.

node --env-file=.env generate-responses.js
MongoDB's latest AI announcements include the launch of the MongoDB
AI Applications Program (MAAP), which provides customers with
reference architectures, pre-built partner integrations, and
professional services to help them build AI-powered applications
quickly. Accenture has joined MAAP as the first global systems
integrator, establishing a center of excellence focused on MongoDB
projects. Additionally, Bendigo and Adelaide Bank have partnered
with MongoDB to modernize their core banking technology using
MongoDB's Relational Migrator and generative AI-powered
modernization tools.
1

Create an interactive Python notebook by saving a file with the .ipynb extension, and then run the following code in the notebook to install the dependencies:

pip install --quiet pymongo langchain langchain_community langchain_mongodb langchain_huggingface pypdf sentence_transformers
2

In this section, you ingest sample data into Atlas that LLMs don't have access to. The following code uses the LangChain integration and PyMongo driver to do the following:

  • Load a PDF that contains a MongoDB earnings report.

  • Split the data into chunks, specifying the chunk size (number of characters) and chunk overlap (number of overlapping characters between consecutive chunks).

  • Load the nomic-embed-text-v1 embedding model from Hugging Face's model hub.

  • Create vector embeddings from the data and store these embeddings in the rag_db.test collection in your Atlas cluster.

Paste and run the following code in your notebook, replacing <connection-string> with your Atlas connection string:

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_mongodb import MongoDBAtlasVectorSearch
from pymongo import MongoClient
# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/12236/pdf")
data = loader.load()
# Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Load the embedding model (https://huggingface.co/nomic-ai/nomic-embed-text-v1")
model = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1", model_kwargs={ "trust_remote_code": True })
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["rag_db"]["test"]
# Store the data as vector embeddings in Atlas
vector_store = MongoDBAtlasVectorSearch.from_documents(
documents = docs,
embedding = model,
collection = collection,
index_name = "vector_index"
)

Tip

After running the code, you can view your vector embeddings in the Atlas UI by navigating to the rag_db.test collection in your cluster.

3

In this section, you set up Atlas Vector Search to retrieve documents from your vector database. Complete the following steps:

  1. Create an Atlas Vector Search index on your vector embeddings.

    You can create the index directly from your application with the PyMongo driver. Paste and run the following code in your notebook:

    pymongo.operations import SearchIndexModel
    # Create your index model, then create the search index
    search_index_model = SearchIndexModel(
    definition = {
    "fields": [
    {
    "type": "vector",
    "numDimensions": 768,
    "path": "embedding",
    "similarity": "cosine"
    }
    ]
    },
    name = "vector_index",
    type = "vectorSearch"
    )
    collection.create_search_index(model=search_index_model)
  2. Configure Atlas Vector Search as a retriever.

    In your notebook, run the following code to set up your retrieval system and run a sample semantic search query by using the LangChain integration:

    # Instantiate Atlas Vector Search as a retriever
    retriever = vector_store.as_retriever(
    search_type = "similarity"
    )
    # Run a sample query in order of relevance
    retriever.invoke("AI technology")
    [Document(metadata={'_id': '66a910ba7f78f7ec6760ceba', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 0}, page_content="more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these\napplications. MongoDB's document-based architecture is particularly well-suited for the variety and scale of data required by AI-powered applications."),
    Document(metadata={'_id': '66a910ba7f78f7ec6760ced6', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 1}, page_content='artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that\nmarket; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to'),
    Document(metadata={'_id': '66a910ba7f78f7ec6760cec3', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 0}, page_content='MongoDB continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP),'),
    Document(metadata={'_id': '66a910ba7f78f7ec6760cec4', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 1}, page_content='which provides customers with reference architectures, pre-built partner integrations, and professional services to help\nthem quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB projects,\nand is the first global systems integrator to join MAAP.')]
4

In this section, you generate responses by prompting an LLM to use the retrieved documents as context. The following code uses LangChain to do the following:

  • Access the Mistral 7B Instruct model from Hugging Face's model hub.

  • Instruct the LLM to include the user's question and retrieved documents in the prompt by using a prompt template and chain.

  • Prompt the LLM about MongoDB's latest AI announcements.

Paste and run the following code in your notebook, replacing <token> with your Hugging Face access token. The generated response might vary.

from langchain_huggingface import HuggingFaceEndpoint
from langchain.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
import os
# Authenticate to your Hugging Face account
os.environ["HF_TOKEN"] = "<token>"
# Access the LLM (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
llm = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.2")
# Create prompt and RAG workflow
prompt = PromptTemplate.from_template("""
Answer the following question based on the given context.
Question: {question}
Context: {context}
""")
rag_chain = (
{ "context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Prompt the LLM
question = "In a few sentences, what are MongoDB's latest AI announcements?"
answer = rag_chain.invoke(question)
print(answer)
Answer: MongoDB recently announced the MongoDB AI Applications Program
(MAAP) as part of their efforts to expand their AI ecosystem.
The document-based architecture of MongoDB is particularly well-suited
for AI-powered applications, offering an opportunity to win more legacy
workloads. These announcements were made at MongoDB.local NYC.

For more detailed RAG tutorials, use the following resources:

To start building production-ready chatbots with Atlas Vector Search, you can use the MongoDB Chatbot Framework. This framework provides a set of libraries that enable you to quickly build AI chatbot applications.

To optimize and fine-tune your RAG applications, you can experiment with different embedding models, chunking strategies, and LLMs. To learn more, see the following resources:

Additionally, Atlas Vector Search supports advanced retrieval systems. Because you can seamlessly index vector data along with your other data in Atlas, you can fine-tune your retrieval results by pre-filtering on other fields in your collection or performing hybrid search to combine semantic search with full-text search results.

Back

Create and Run Queries