Building a Semantic Search Service With Spring AI and MongoDB Atlas
Rate this tutorial
What is the song that goes, "Duh da, duh da, DUH da duh"? We've all been plagued by this before. We remember a snippet of the chorus, we know it has something to do with a hotel in Chelsea, but what is that song? I can't remember the title — how do you search by vibe?! Well, with the power of AI, we are able to search our databases, not just by matching words, but searching the semantic meaning of the text. And with Spring AI, you can incorporate the AI-powered search into your Spring application. With just the vague memory of a famous woman who prefers handsome men, we can locate our Leonard Cohen classic.
Spring AI is an application framework from Spring that allows you to combine various AI services and plugins with your applications. With support for many chat, text-to-image, and embedding models, you can get your AI-powered Java application set up for a variety of AI use cases.
With Spring AI, MongoDB Atlas is supported as a vector database, all with Atlas Vector Search to power your semantic search and implement your RAG applications. To learn more about RAG and other key concepts in AI, check out the MongoDB AI integration docs.
In this tutorial, we’ll go through what you need to get started with Spring AI and MongoDB, adding documents to your database with the vectorised content (embeddings), and searching this content with semantic search. The full code for this tutorial is available in the GitHub repository.
Before starting this tutorial, you'll need to have the following:
- A MongoDB Atlas account and an M10+ cluster running MongoDB version 6.0.11, 7.0.2, or later
- An M10+ cluster is necessary to create the index programmatically (by Spring AI).
- An OpenAI API key with a paid OpenAI account and available credits
- Java 21 and an IDE such as IntelliJ IDEA or Eclipse
- Maven 3.9.6+ configured for your project
- Project: Maven
- Language: Java
- Spring Boot: Default version
- Java: 21
Add the following dependencies:
- MongoDB Atlas Vector Database
- Spring Web
- OpenAI (other embedding models are available, but we use this for the tutorial)
Generate and download your project, then open it in your IDE.
Open the application in the IDE of your choosing and the first thing we will do is inspect our
pom.xml
. In order to use the latest version of Spring AI, change the spring-ai.version
version for the Spring AI BOM to 1.0.0-SNAPSHOT
. As of writing this article, it will be 1.0.0-M1
by default.Configure your Spring application to set up the vector store and other necessary beans.
In our application properties, we are going to configure our MongoDB database, as well as everything we need for semantically searching our data. We'll also add in information such as our OpenAI embedding model and API key.
You'll see at the end, we are setting the initialized schema to be
true
. This means our application will set up our search index (if it doesn't exist) so we can semantically search our data. If you already have a search index set up with this name and configuration, you can set this to be false
.In your IDE, open up your project. Create a
Config.java
file in a config
package. Here, we are going to set up our OpenAI embedding model. Spring AI makes this a very simple process.Now, we are able to send away our data to be vectorized, and receive the vectorized results.
Create a package called
model
, for our DocumentRequest
class to go in. This is what we are going to be storing in our MongoDB database. The content will be what we are embedding — so lyrics, in our case. The metadata will be anything we want to store alongside it, so artists, albums, or genres. This metadata will be returned alongside our content and can also be used to filter our results.Create a
repository
package and add a LyricSearchRepository
interface. Here, we'll define some of the methods we'll implement later.Create a
LyricSearchRepositoryImpl
class to implement the repository interface.We are using the methods
add
, delete
, and similaritySearch
, all already defined and implemented in Spring AI. These will allow us to embed our data when adding them to our MongoDB database, and we can search these embeddings with vector search.Create a
service
package and inside, a LyricSearchService
class to handle business logic for our lyrical search application. We will implement these methods later in the tutorial:Create a controller package and a
LyricSearchController
class to handle HTTP requests. We are going to add a call to add our data, a call to delete any documents we no longer need, and a search call, to semantically search our data.These will call back to the methods we defined earlier. We’ll implement them in the next steps:
In our
LyricSearchService
class, let's add some logic to take in our documents and add them to our MongoDB database.This function takes a single parameter,
documents
, which is a list of DocumentRequest
objects. These represent the documents that need to be processed and added to the repository.The function first checks if the
documents
list is null or empty.The
documents
list is converted into a stream to facilitate functional-style operations.The filter is a bit of pre-processing to help clean up our data. It removes any
DocumentRequest
objects that are null, have null content, or have empty (or whitespace-only) content. This ensures that only valid documents are processed further.Know your limits! The filter removes any
Document
objects whose content exceeds the maximum token limit (MAX_TOKENS
) for the OpenAI API. The token limit is estimated based on word count, assuming one word is slightly more than one token (not far off the truth). This estimation works for the demo, but in production, we would likely want to implement a form of chunking, where large bodies of text are separated into smaller, more digestible pieces.Each
DocumentRequest
object is transformed into a Document
object. The Document
constructor is called with the content and metadata from the DocumentRequest
.The filtered and transformed
Document
objects are collected into a list and these documents are added to our MongoDB vector store, along with an embedding of the lyrics.We'll also add our function to delete documents while we're here:
And the appropriate imports:
Now that we have the logic, let’s add the endpoints to our
LyricSearchController
.And our imports:
To test our embedding, let's keep it simple with a few nursery rhymes for now.
Build and run your application. Use the following CURL command to add sample documents:
Let's define our searching method in our
LyricSearchService
. This is how we will semantically search our documents in our database.This method take in:
-
query
: A String
representing the search query or the text for which you want to find semantically similar lyrics
- topK
: An int
specifying the number of top results to retrieve (i.e., top 10)
- similarityThreshold
: A double
indicating the minimum similarity score a result must have to be included in the resultsThis returns a list of
Map<String, Object>
objects. Each map contains the content and metadata of a document that matches the search criteria.And the imports to our service:
Let's add an endpoint to our controller, and build and run our application.
And the imports:
Use the following CURL command to search your data bases for lyrics about small celestial bodies:
And voila! We have our twinkly little star at the top of our list.
In order to filter our data, we need to head over to our index in MongoDB. You can do this through the Atlas UI by selecting the collection where your data is stored and going to the search indexes. You can edit this index by selecting the three dots on the right of the index name and we will add our filter for the artist.
Let's head back to our
LyricSearchService
and add a method with an artist parameter so we can filter our results.And the imports we'll need:
And lastly, an endpoint in our controller:
Now, we are able to not only search as before, but we can say we want to restrict it to only specific artists.
Use the following CURL command to try a semantic search with metadata filtering:
Unlike before, and even asking for the top five results, we are only returned the one document because we only have one document from the artist Jane Taylor. Hooray!
You now have a Spring application that allows you to search through your data by performing semantic searches. This is an important step when you are looking to implement your RAG applications, or just an AI-enhanced search feature in your applications.
If you want to learn more about the MongoDB Spring AI integration, follow along with the quick-start Get Started With the Spring AI Integration, and if you have any questions or want to show us what you are building, join us in the MongoDB Community Forums.
Top Comments in Forums
There are no comments on this article yet.