Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Introducing MongoDB 8.0, the fastest MongoDB ever!
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Beyond Basics: Enhancing Kotlin Ktor API With Vector Search

Ricardo Mello8 min read • Published Sep 18, 2024 • Updated Sep 18, 2024
AIVector SearchKotlinAtlas
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
In this article, we will delve into advanced MongoDB techniques in conjunction with the Kotlin Ktor API, building upon the foundation established in our previous article, Mastering Kotlin: Creating an API With Ktor and MongoDB Atlas. Our focus will be on integrating robust features such as Hugging Face, Vector Search, and MongoDB Atlas triggers/functions to augment the functionality and performance of our API.
We will start by providing an overview of these advanced MongoDB techniques and their critical role in contemporary API development. Subsequently, we will delve into practical implementations, showcasing how you can seamlessly integrate Hugging Face for natural language processing, leverage Vector Search for rapid data retrieval, and automate database processes using triggers and functions.

Prerequisites

Demonstration

We'll begin by importing a dataset of fitness exercises into MongoDB Atlas as documents. Then, we'll create a trigger that activates upon insertion. For each document in the dataset, a function will be invoked to request Hugging Face's API. This function will send the exercise description for conversion into an embedded array, which will be saved into the exercises collection as descEmbedding:
Atlas Application architecture
In the second part, we will modify the Kotlin Ktor application to incorporate HTTP client calls, enabling interaction with the Hugging Face API. Additionally, we will create a /exercises/processRequest endpoint. This endpoint will accept a text input, which will be processed using the Hugging Face API to generate an embedded array. Subsequently, we will compare this array with the descEmbedding generated in the first part. Utilizing vector search, we will return the three most proximate results (in this case, the fitness exercises that are most relevant to my search):
Kotlin Application Architecture

MongoDB Setup and Hugging Face Integration

1. Creating exercises collection

The first step in achieving our goal is to create an empty collection called "exercises" that will later store our dataset. Begin by logging in to your MongoDB Atlas account. From the Atlas dashboard, navigate to your cluster and select the database where you want to create the collection. Click on the "Collections" tab to manage your collections within that database and create an empty exercises collection:
Creating exercises collection

2. Creating a trigger and function

Next, we need to create a trigger that will activate whenever a new document is inserted into the exercises collection. Navigate to the Triggers tab and create a trigger named "Trigger_Exercises" as shown in the images below:
Creating exercises Trigger
Remember to choose the "exercises" collection, select "Insert Document" for the operation type, and enable "Full Document.”
Creating exercises Trigger
Finally, paste the following function code into the "Function" field and click "Save":
1exports = async function(changeEvent) {
2 const doc = changeEvent.fullDocument;
3
4 const url = 'https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2';
5 const hf_key = context.values.get("HF_value");
6 try {
7 console.log(`Processing document with id: ${doc._id}`);
8
9 let response = await context.http.post({
10 url: url,
11 headers: {
12 'Authorization': [`Bearer ${hf_key}`],
13 'Content-Type': ['application/json']
14 },
15
16 body: JSON.stringify({
17 inputs: [doc.description]
18 })
19 });
20
21 let responseData = EJSON.parse(response.body.text());
22
23 if(response.statusCode === 200) {
24 console.log("Successfully received embedding.");
25
26 const embedding = responseData[0];
27
28 const collection = context.services.get("Cluster0").db("my_database").collection("exercises");
29
30 const result = await collection.updateOne(
31 { _id: doc._id },
32 { $set: { descEmbedding: embedding }}
33 );
34
35 if(result.modifiedCount === 1) {
36 console.log("Successfully updated the document.");
37 } else {
38 console.log("Failed to update the document.");
39 }
40 } else {
41 console.log(`Failed to receive embedding. Status code: ${response.statusCode} plus ${response}`);
42 }
43
44 } catch(err) {
45 console.error(err);
46 }
47};
Creating exercises Function
This function serves as a bridge between MongoDB and the Hugging Face API, enhancing documents stored in a MongoDB collection with embeddings generated by the API. The function is triggered by a change event in the MongoDB collection, specifically when a new document is inserted or an existing document is updated.
Now, let's explore the functionality of this function:
  1. Event handling: The function extracts the full document from the MongoDB change event to be processed.
  2. Hugging Face API interaction: It interacts with the Hugging Face API to obtain an embedding for the document's description. This involves sending an HTTP POST request to the API's feature extraction endpoint, with the document's description as input.
  3. MongoDB update: Upon receiving a successful response from the Hugging Face API, the function updates the document in the MongoDB collection with the extracted embedding. This enriches the document with additional information useful for various natural language processing tasks.

3. Renaming the function

To align our environment with the demonstration image, let's change the name of our function to Function_Exercises. To do this, access the "Functions" menu and edit the function:
Selecting App Service Trigger
Creating new exercises Trigger
Then, enter the new name and click “Save”:
Renaming Function

4. Getting the Hugging Face access token

The function we previously created requires a token to access Hugging Face. We need to obtain and configure it in Atlas. To do this, log in to your Hugging Face account, and access the settings to create your key:
Getting Hugging Face Token
After copying your key, let's return to MongoDB Atlas and configure our key for access. Click on the "Values" button in the side menu and select “Create New Value”:
Creating new Application Values
Now, we need to create a secret and a value that will be associated with this secret.
First, create the secret by entering the key from Hugging Face:
Creating Application Secret
Then, create a value named HF_value (which will be used in our function) and associate it with the secret, as shown in the image:
Creating Application Value
If everything has gone perfectly, our values will look like this:
Application Values List
We have finished configuring our environment. To recap:
Creating the empty collection:
  • We created an empty collection named "exercises" in MongoDB Atlas. This collection will receive input data, triggering a process to convert the exercises description into embedded values.
Setting up triggers and functions:
  • A trigger named "Trigger_Exercises" was created to activate upon document insertion.
  • The trigger calls a function named "Function_Exercises" for each inserted document.
  • The function processes the description using the Hugging Face API to generate embedded values, which are then added to the "exercises" collection.
Final configuration:
  • To complete the setup, we associated a secret and a value with the Hugging Face key in MongoDB Atlas.

5. Importing a dataset

In this step, we will import a dataset of 50 documents containing information about exercises:
Exercises Document Sample
To achieve this goal, I will use MongoDB Tools to import the exercises.json file via the command line. After installing MongoDB Tools, simply paste the "exercises.json" file into the "bin" folder and execute the command, as shown in the image below:
Mongo Tools Import
1 .\mongoimport mongodb+srv://<user>:<password>@cluster0.xpto.mongodb.net/my_database --collection exercises --jsonArray .\exercises.json
Notice: Remember to change your user, password, and cluster.
If everything goes well, we will see that we have imported 50 exercises.
Dataset imported with sucessfully
Now, let's check the logs of our function to ensure everything went smoothly. To do this, navigate to the "App Services" tab and click on "Logs":
Checking App Services Logs
And now, let's view our collection:
Exercises collection with embedded data
As we can see, we have transformed the descriptions of the 50 exercises into vector values and assigned them to the "descEmbedding" field.
Let's proceed with the changes in our Kotlin application. If you haven't already, you can download the application. Our objective is to create an endpoint /processRequest to send an input to HuggingFace, such as:
"I need an exercise for my shoulders and to lose my belly fat."
Postman final demonstration
We will convert this information into embedded data and utilize Vector Search to return the three exercises that most closely match this input. To begin, let's include two dependencies in the build.gradle.kts file that will allow us to make HTTP calls to Hugging Face:
build.gradle.kts
1//Client
2implementation("io.ktor:ktor-client-core:$ktor_version")
3implementation("io.ktor:ktor-client-cio:$ktor_version")
In the ports package, we will create a repository that will retrieve exercises from the database:
domain/ports/ExercisesRepository
1package com.mongodb.domain.ports
2
3import com.mongodb.domain.entity.Exercises
4
5interface ExercisesRepository {
6 suspend fun findSimilarExercises(embedding: List<Double>): List<Exercises>
7}
We will create a response to display some information to the user:
application/response/ExercisesResponse
1package com.mongodb.application.response
2data class ExercisesResponse(
3 val exerciseNumber: Int,
4 val bodyPart: String,
5 val type: String,
6 val description: String,
7 val title: String
8)
Now, create the Exercises class:
domain/entity/Exercises
1package com.mongodb.domain.entity
2
3
4import com.mongodb.application.response.ExercisesResponse
5import org.bson.codecs.pojo.annotations.BsonId
6import org.bson.types.ObjectId
7
8
9data class Exercises(
10 @BsonId
11 val id: ObjectId,
12 val exerciseNumber: Int,
13 val title: String,
14 val description: String,
15 val type: String,
16 val bodyPart: String,
17 val equipment: String,
18 val level: String,
19 val rating: Double,
20 val ratingDesc: String,
21 val descEmbedding: List<Double>
22){
23 fun toResponse() = ExercisesResponse(
24 exerciseNumber = exerciseNumber,
25 title = title,
26 description = description,
27 bodyPart = bodyPart,
28 type = type
29 )
30}
Next, we will implement our interface that will communicate with the database by executing an aggregate query using the vector search that we will create later.
infrastructure/ExercisesRepositoryImpl
1package com.mongodb.infrastructure
2
3import com.mongodb.domain.entity.Exercises
4import com.mongodb.domain.ports.ExercisesRepository
5import com.mongodb.kotlin.client.coroutine.MongoDatabase
6import kotlinx.coroutines.flow.toList
7import org.bson.Document
8
9class ExercisesRepositoryImpl(
10 private val mongoDatabase: MongoDatabase
11) : ExercisesRepository {
12
13 companion object {
14 const val EXERCISES_COLLECTION = "exercises"
15 }
16
17 override suspend fun findSimilarExercises(embedding: List<Double>): List<Exercises> {
18 val result =
19 mongoDatabase.getCollection<Exercises>(EXERCISES_COLLECTION).withDocumentClass<Exercises>().aggregate(
20 listOf(
21 Document(
22 "\$vectorSearch",
23 Document("queryVector", embedding)
24 .append("path", "descEmbedding")
25 .append("numCandidates", 3L)
26 .append("index", "vector_index")
27 .append("limit", 3L)
28 )
29 )
30 )
31
32 return result.toList()
33 }
34}
Now, let's create our endpoint to access Hugging Face and then call the method created earlier:
application/routes/ExercisesRoutes
1package com.mongodb.application.routes
2
3import com.mongodb.application.request.SentenceRequest
4import com.mongodb.domain.ports.ExercisesRepository
5import com.mongodb.huggingFaceApiUrl
6import io.ktor.client.*
7import io.ktor.client.call.*
8import io.ktor.client.engine.cio.*
9import io.ktor.client.request.*
10import io.ktor.client.statement.*
11import io.ktor.http.*
12import io.ktor.http.content.*
13import io.ktor.server.application.*
14import io.ktor.server.request.*
15import io.ktor.server.response.*
16import io.ktor.server.routing.*
17import org.koin.ktor.ext.inject
18
19fun Route.exercisesRoutes() {
20 val repository by inject<ExercisesRepository>()
21
22 route("/exercises/processRequest") {
23 post {
24 val sentence = call.receive<SentenceRequest>()
25
26 val response = requestSentenceTransform(sentence.input, call.huggingFaceApiUrl())
27
28 if (response.status.isSuccess()) {
29 val embedding = sentence.convertResponse(response.body())
30 val similarDocuments = repository.findSimilarExercises(embedding)
31
32 call.respond(HttpStatusCode.Accepted, similarDocuments.map { it.toResponse() })
33 }
34 }
35 }
36}
37
38suspend fun requestSentenceTransform(input: String, huggingFaceURL: String): HttpResponse {
39
40 println(huggingFaceURL)
41
42 return HttpClient(CIO).use { client ->
43
44 val response = client.post(huggingFaceURL) {
45 val content = TextContent(input, ContentType.Text.Plain)
46 setBody(content)
47 }
48
49 response
50 }
51}
Next, let's create the request that we will send to Hugging Face. In this class, in addition to the input, we have a converter to convert the return from String to Double:
application/request/SentenceRequest
1package com.mongodb.application.request
2
3data class SentenceRequest(
4 val input: String
5)
6{
7 fun convertResponse(body: String) =
8 body
9 .replace("[", "")
10 .replace("]", "")
11 .split(",")
12 .map { it.trim().toDouble() }
13}
Let's include the route created earlier and a huggingFaceApiUrl method in our Application class. Here's the complete code:
Application.kt
1package com.mongodb
2
3import com.mongodb.application.routes.exercisesRoutes
4import com.mongodb.application.routes.fitnessRoutes
5import com.mongodb.domain.ports.ExercisesRepository
6import com.mongodb.domain.ports.FitnessRepository
7import com.mongodb.infrastructure.ExercisesRepositoryImpl
8import com.mongodb.infrastructure.FitnessRepositoryImpl
9import com.mongodb.kotlin.client.coroutine.MongoClient
10import io.ktor.serialization.gson.*
11import io.ktor.server.application.*
12import io.ktor.server.plugins.contentnegotiation.*
13import io.ktor.server.plugins.swagger.*
14import io.ktor.server.routing.*
15import io.ktor.server.tomcat.*
16import org.koin.dsl.module
17import org.koin.ktor.plugin.Koin
18import org.koin.logger.slf4jLogger
19
20fun main(args: Array<String>): Unit = EngineMain.main(args)
21fun Application.module() {
22
23 install(ContentNegotiation) {
24 gson {
25 }
26 }
27
28 // Other code..
29
30 routing {
31 // Other code..
32
33 exercisesRoutes()
34 }
35}
36
37fun ApplicationCall.huggingFaceApiUrl(): String {
38 return application.environment.config.propertyOrNull("ktor.huggingface.api.url")?.getString()
39 ?: throw RuntimeException("Failed to access Hugging Face API base URL.")
40
41}
Finally, let's include the Hugging Face endpoint in the application.conf file.
application.conf
1ktor {
2
3 // Other code ..
4
5 huggingface {
6 api {
7 url = "https://api-inference.huggingface.co/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2"
8 }
9 }
10}
Now, we need to go back to Atlas and create our vector search index. Follow the images below:
Creating new Atlas Search Index
Select Atlas Vector Search:
Creating new Atlas Vector Search Index
Creating new Atlas Vector Search Index
If everything is okay, you will see a success message like the one below, indicating that the index was successfully created in MongoDB Atlas:
Creating new Atlas Vector Search Index
This code snippet defines a vector index on the descEmbedding field in our exercises collection. The type field specifies that this is a vector index. The path field indicates the path to the field containing the vector data. In this case, we are using the descEmbedding field. The numDimensions field specifies the number of dimensions
of the vectors, which is 384 in this case. Lastly, the similarity field specifies the similarity metric to be used for comparing vectors, which is the Euclidean distance.
1{
2 "fields": [
3 {
4 "type": "vector",
5 "path": "descEmbedding",
6 "numDimensions": 384,
7 "similarity": "euclidean"
8 }
9 ]
10}
After implementing the latest updates and configurations, it's time to test the application. Let's start by running the application. Open Application.kt and click on the run button:
Running the Application
Once the application is up and running, you can proceed with testing using the following curl command:
1curl --location 'http://localhost:8081/exercises/processRequest' \
2--header 'Content-Type: application/json' \
3--data '{
4 "input": "I need an exercise for my shoulders and to lose my belly fat"
5}'
Requesting processRequest in

Conclusion

This article showcased how to enrich MongoDB documents with embeddings from the Hugging Face API, leveraging its powerful natural language processing capabilities. The provided function demonstrates handling change events in a MongoDB collection and interacting with an external API. This integration offers developers opportunities to enhance their applications with NLP features, highlighting the potential of combining technologies for more intelligent applications.
The example source code is available on GitHub.
If you have any questions or want to discuss further implementations, feel free to reach out to the MongoDB Developer Community forum for support and guidance.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Leveraging OpenAI and MongoDB Atlas for Improved Search Functionality


Sep 18, 2024 | 5 min read
Tutorial

Building an Advanced RAG System With Self-Querying Retrieval


Sep 12, 2024 | 21 min read
Article

Comparing NLP Techniques for Scalable Product Search


Sep 23, 2024 | 8 min read
Article

Querying the MongoDB Atlas Price Book with Atlas Data Federation


Jun 15, 2023 | 4 min read
Table of Contents