Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

Retrieval-Augmented Generation With MongoDB and Spring AI: Bringing AI to Your Java Applications

Tim Kelly6 min read • Published Sep 23, 2024 • Updated Sep 23, 2024
SpringAIJava
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
AI this, AI that. Well, what can AI actually do for me? In this tutorial, we are going to discuss how we can leverage our own data to get the most out of generative AI.
And that’s where retrieval-augmented generation (RAG) comes in. It uses AI where it belongs — retrieving the right information and generating smart, context-aware answers. In this tutorial, we’re going to build a RAG app using Spring Boot, MongoDB Atlas, and OpenAI. The full code is available on GitHub.

What is retrieval-augmented generation?

RAG allows you to use data that was not available to train an AI model, to stuff your prompt and then use this data to supplement the large language model’s (LLM) response.
LLMs are a type of artificial intelligence (AI) that can generate and understand data. They are trained on massive datasets and can be used for answering your questions in an informative way.
While LLMs are very powerful, they have some limitations. One limitation is that outputs that are not always accurate or up-to-date. This is because LLMs are trained on data that has since become outdated, incomplete, or lacks proprietary knowledge about a specific use case or domain.
If you have data that has to remain internal for data security reasons, or even just questions on more up-to-date data, RAG can help you.
RAG consists of three main components:
  1. Your pre-trained LLM: This is what will generate the response — OpenAI, in our case.
  2. Vector search (semantic search): This is how we retrieve relevant documents from our MongoDB database.
  3. Vector embeddings: A numerical representation of our data captures the semantic meaning of our data.
A large language model being made useful in a generative AI application by leveraging retrieval-augmented generation.

Prerequisites

Before beginning this tutorial, ensure that you have the following installed and configured:
  1. Java 21 or higher.
  2. Maven or Gradle (for managing dependencies): We use Maven for this tutorial.
  3. MongoDB Atlas: You’ll need a MongoDB Atlas cluster.
    • A minimum M10+ cluster is necessary to use the Spring AI MongoDB vector store as it creates the search index on our database programmatically.
  4. OpenAI API key: Sign up for OpenAI and obtain an API key.
    • Other models are available, but this tutorial uses OpenAI.

Preparing your project

Spring Initializr

To initialize the project:
  1. Set up the project metadata:
    • Group: com.mongodb
    • Artifact: RagApp
    • Dependencies:
      • Spring Web
      • MongoDB Atlas Vector Database
      • Open AI
  2. Download the project and open it in your preferred IDE. A large language model being made useful in a generative AI application by leveraging retrieval-augmented generation.

Configuration

Before we do anything, let's go to our pom.xml file and check the Spring AI version is <spring-ai.version>1.0.0-SNAPSHOT</spring-ai.version>. We may need to change it to this, depending on what version of Spring we are using.
The configuration for this project involves setting up two primary components:
  • The EmbeddingModel using OpenAI to generate embeddings for documents.
  • A MongoDBAtlasVectorStore to store and manage document vectors for similarity searches.
We’ll need to configure our project to connect to OpenAI and MongoDB Atlas by adding several properties to the application.properties file, along with the necessary credentials.
1spring.application.name=RagApp
2
3spring.ai.openai.api-key=<Your-API-Key>
4spring.ai.openai.chat.options.model=gpt-4o
5
6spring.ai.vectorstore.mongodb.initialize-schema=true
7
8spring.data.mongodb.uri=<Your-Connection-URI>
9spring.data.mongodb.database=rag
You'll see here we have initialize.schema set to True. This creates the index on our collection automatically, using Spring AI. If you are running a free cluster, this is not available. A workaround to this is creating it manually, which you can learn to do in the MongoDB documentation.
Create a config package and add a Config.java to work in. Here’s how the configuration is set up in the Config class:
1import org.springframework.ai.embedding.EmbeddingModel;
2import org.springframework.ai.openai.OpenAiEmbeddingModel;
3import org.springframework.ai.openai.api.OpenAiApi;
4import org.springframework.ai.vectorstore.MongoDBAtlasVectorStore;
5import org.springframework.ai.vectorstore.VectorStore;
6import org.springframework.beans.factory.annotation.Value;
7import org.springframework.context.annotation.Bean;
8import org.springframework.context.annotation.Configuration;
9import org.springframework.data.mongodb.core.MongoTemplate;
10
11@Configuration
12public class Config {
13
14 @Value("${spring.ai.openai.api-key}")
15 private String openAiKey;
16
17 @Bean
18 public EmbeddingModel embeddingModel() {
19 return new OpenAiEmbeddingModel(new OpenAiApi(openAiKey));
20 }
21
22 @Bean
23 public VectorStore mongodbVectorStore(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
24 return new MongoDBAtlasVectorStore(mongoTemplate, embeddingModel,
25 MongoDBAtlasVectorStore.MongoDBVectorStoreConfig.builder().build(), true);
26 }
27
28}
This class initializes the connection to the OpenAI API and configures the MongoDB-based vector store for storing document embeddings.

Embedding the data

For this tutorial, we are using the MongoDB/devcenter-articles dataset, available on Hugging Face. This dataset consists of articles from the MongoDB Developer Center. In our resources, create a directory called docs and add our file to read in.
To embed and store data in the vector store, we’ll use a service that reads documents from a JSON file, converts them into embeddings, and stores them in the MongoDB Atlas vector store. This is done using the DocsLoaderService.java that we will create in a service package:
1package com.mongodb.RagApp.service;
2
3import com.fasterxml.jackson.databind.ObjectMapper;
4import org.springframework.ai.document.Document;
5import org.springframework.ai.vectorstore.VectorStore;
6import org.springframework.beans.factory.annotation.Autowired;
7import org.springframework.core.io.ClassPathResource;
8import org.springframework.stereotype.Service;
9
10import java.io.BufferedReader;
11import java.io.InputStream;
12import java.io.InputStreamReader;
13import java.util.ArrayList;
14import java.util.List;
15import java.util.Map;
16
17@Service
18public class DocsLoaderService {
19
20 private static final int MAX_TOKENS_PER_CHUNK = 2000;
21 private final VectorStore vectorStore;
22 private final ObjectMapper objectMapper;
23
24 @Autowired
25 public DocsLoaderService(VectorStore vectorStore, ObjectMapper objectMapper) {
26 this.vectorStore = vectorStore;
27 this.objectMapper = objectMapper;
28 }
29
30 public String loadDocs() {
31 try (InputStream inputStream = new ClassPathResource("docs/devcenter-content-snapshot.2024-05-20.json").getInputStream();
32 BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
33
34 List<Document> documents = new ArrayList<>();
35 String line;
36
37 while ((line = reader.readLine()) != null) {
38 Map<String, Object> jsonDoc = objectMapper.readValue(line, Map.class);
39 String content = (String) jsonDoc.get("body");
40
41 // Split the content into smaller chunks if it exceeds the token limit
42 List<String> chunks = splitIntoChunks(content, MAX_TOKENS_PER_CHUNK);
43
44 // Create a Document for each chunk and add it to the list
45 for (String chunk : chunks) {
46 Document document = createDocument(jsonDoc, chunk);
47 documents.add(document);
48 }
49 // Add documents in batches to avoid memory overload
50 if (documents.size() >= 100) {
51 vectorStore.add(documents);
52 documents.clear();
53 }
54 }
55 if (!documents.isEmpty()) {
56 vectorStore.add(documents);
57 }
58
59 return "All documents added successfully!";
60 } catch (Exception e) {
61 return "An error occurred while adding documents: " + e.getMessage();
62 }
63 }
64
65 private Document createDocument(Map<String, Object> jsonMap, String content) {
66 Map<String, Object> metadata = (Map<String, Object>) jsonMap.get("metadata");
67
68 metadata.putIfAbsent("sourceName", jsonMap.get("sourceName"));
69 metadata.putIfAbsent("url", jsonMap.get("url"));
70 metadata.putIfAbsent("action", jsonMap.get("action"));
71 metadata.putIfAbsent("format", jsonMap.get("format"));
72 metadata.putIfAbsent("updated", jsonMap.get("updated"));
73
74 return new Document(content, metadata);
75 }
76
77 private List<String> splitIntoChunks(String content, int maxTokens) {
78 List<String> chunks = new ArrayList<>();
79 String[] words = content.split("\\s+");
80 StringBuilder chunk = new StringBuilder();
81 int tokenCount = 0;
82
83 for (String word : words) {
84 // Estimate token count for the word (approximated by character length for simplicity)
85 int wordTokens = word.length() / 4; // Rough estimate: 1 token = ~4 characters
86 if (tokenCount + wordTokens > maxTokens) {
87 chunks.add(chunk.toString());
88 chunk.setLength(0); // Clear the buffer
89 tokenCount = 0;
90 }
91 chunk.append(word).append(" ");
92 tokenCount += wordTokens;
93 }
94 if (chunk.length() > 0) {
95 chunks.add(chunk.toString());
96 }
97 return chunks;
98 }
99}
This service reads a JSON file, processes each document, and stores it in MongoDB, along with an embedded vector of our content.
Now, this is a very simplistic approach of chunking (splitting large documents into smaller chunks that stay within the token limit and process them separately) implemented. This is because OpenAI has a token limit, so some of our documents are too large to embed in one go. This is fine for testing, but if you are moving to production, do your research and decide your own best way for dealing with these large documents.
Call this method however you wish, but I created a simple DocsLoaderController in my controller package for testing.
1import com.mongodb.RagApp.service.DocsLoaderService;
2import org.springframework.web.bind.annotation.GetMapping;
3import org.springframework.web.bind.annotation.RequestMapping;
4import org.springframework.web.bind.annotation.RestController;
5
6@RestController
7@RequestMapping("/api/docs")
8public class DocsLoaderController {
9
10 private DocsLoaderService docsLoaderService;
11
12 public DocsLoaderController(DocsLoaderService docsLoaderService) {
13 this.docsLoaderService = docsLoaderService;
14 }
15
16 @GetMapping("/load")
17 public String loadDocuments() {
18 return docsLoaderService.loadDocs();
19 }
20
21}

Retrieving and augmenting said generation

Once the data is embedded and stored, we can retrieve it through an API that uses a vector search to return relevant results. The RagController class is responsible for this:
1import org.springframework.ai.chat.client.ChatClient;
2import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
3import org.springframework.ai.vectorstore.SearchRequest;
4import org.springframework.ai.vectorstore.VectorStore;
5import org.springframework.web.bind.annotation.CrossOrigin;
6import org.springframework.web.bind.annotation.GetMapping;
7import org.springframework.web.bind.annotation.RequestParam;
8import org.springframework.web.bind.annotation.RestController;
9
10@RestController
11public class RagController {
12
13 private final ChatClient chatClient;
14
15 public RagController(ChatClient.Builder builder, VectorStore vectorStore) {
16 this.chatClient = builder
17 .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
18 .build();
19 }
20
21 @GetMapping("/question")
22 public String question(@RequestParam(value = "message", defaultValue = "How to analyze time-series data with Python and MongoDB?") String message) {
23 return chatClient.prompt()
24 .user(message)
25 .call()
26 .content();
27 }
28}
There's a little bit going on here. Let's look at the ChatClient. It offers an API for communicating with our AI model.
The AI model processes two types of messages: 1. User messages, which are direct inputs from the user. 2. System messages, which are generated by the system to guide the conversation.
For the system message, we are using the default from the QuestionsAnswerAdvisor:
1private static final String DEFAULT_USER_TEXT_ADVISE = """
2 Context information is below.
3 ---------------------
4 {question_answer_context}
5 ---------------------
6 Given the context and provided history information and not prior knowledge,
7 reply to the user comment. If the answer is not in the context, inform
8 the user that you can't answer the question.
9 """;
But we could edit this message and tailor it to our needs. There are also prompt options that can be specified, such as the temperature setting that controls the randomness or creativity of the generated output. You can find out more from the Spring documentation.
The /question endpoint allows users to ask questions, and it retrieves answers from the vector store by searching against the embedded documents semantically and sends these to the LLM with our context.

Testing the implementation

To test our implementation:
  1. Start the Spring Boot application.
  2. Navigate to http://localhost:8080/api/docs/load to load documents into the vector store.
  3. Use http://localhost:8080/question?message=Your question here to test the question-answer functionality.
For example, try asking:
http://localhost:8080/question?message=How to analyze time-series data with Python and MongoDB?Explain the steps
We should receive a relevant answer from the RAG app, formed from the embedded document data and the LLM.

Conclusion

In this project, we integrated a retrieval-augmented generation (RAG) system using MongoDB, OpenAI embeddings, and Spring Boot. The system can embed large amounts of document data and answer questions by leveraging vector similarity searches from a MongoDB Atlas vector store.
Next, learn more about what you can do with Java and MongoDB. You might enjoy Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot. Or head over to the community forums and see what other people are doing with MongoDB.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Microservices Architecture: Deploy Locally With Java, Spring, and MongoDB


Aug 29, 2024 | 4 min read
Tutorial

Exploring Search Capabilities With Atlas Search


Aug 20, 2024 | 9 min read
Tutorial

Single-Collection Designs in MongoDB with Spring Data (Part 2)


Aug 12, 2024 | 10 min read
Tutorial

Building a Real-Time, Dynamic Seller Dashboard on MongoDB


Aug 05, 2024 | 7 min read
Table of Contents