Getting Started with Microsoft's Semantic Kernel in C# and MongoDB Atlas
Rate this tutorial
Semantic Kernel has become hugely popular within the Microsoft ecosystem. In fact, at Microsoft Build, Semantic Kernel and AI with MongoDB was the most discussed topic at our booth.
Semantic Kernel is Microsoft’s AI SDK available in Java, Python, and C#. It allows you to build powerful AI applications by chaining together out-of-the-box, community-created, custom plugins. These plugins work together to create plans that allow you to achieve complex tasks. This could be anything from tidying up Scott Hanselman’s desktop to summarizing a block of text and emailing you the summary. The possibilities are endless!
Semantic Kernel is a tool for building retrieval-augmented generation (RAG) apps. The R and A parts come from retrieving information to use as context in the input to the large language model (LLM). This is where MongoDB comes in. MongoDB is an option for storing data, including embeddings representing that data, and even gives you the ability to search the data using Atlas Vector Search.
Semantic Kernel has support for MongoDB Atlas thanks to a connector. So not only can you store your data in MongoDB, including the embeddings, but it also automatically uses Vector Search under the hood to retrieve the results. You get the best of Semantic Kernel and the best of MongoDB, the most popular document database for C# developers!
In this tutorial, you will learn how to get started with Semantic Kernel and MongoDB, taking advantage of the connector and the SemanticTextMemory plugin, to create a bot that will recommend a movie to watch, using OpenAI to create embeddings, and searching the sample movie data in our sample dataset.
To follow along with this tutorial, you will need a few things in place:
- A MongoDB M0 cluster
- The sample data loaded into that cluster
- A free OpenAI account and project API key
- .NET 8 or higher
- An IDE or text editor to follow along
If you would prefer to simply read the code, you can find it on GitHub. It has two branches, depending on whether you have access to Azure OpenAI or want to use OpenAI. We will be using OpenAI for this tutorial as it is free and open to all at time of writing.
Now you have the prerequisites in place, it is time to create the project and add the NuGet packages you will need to create the bot.
- Create a new console project, either using your IDE or via the DotNet CLI.
- Add the following NuGet packages to your new project
- Microsoft.SemanticKernel
- Microsoft.SemanticKernel.Connectors.MongoDB (N.B. This is in prerelease)
- Microsoft.SemanticKernel.Connectors.OpenAI
There are a few variables we are going to need throughout this tutorial so we will start by setting them up in
Program.cs
.Because we want to create at least one other method in this tutorial, we will also switch to the traditional structure of our program class. Replace the contents with the following:
The pragma warning disable addition is because a lot of the features are experimental and this will turn off the errors.
Go ahead and replace the placeholders for OpenAI and Atlas with your own values.
You may have noticed in the last section that you added a MemoryBuilder variable. This builder is what gives you access to the memory plugin, an out-of-the-box plugin for working with stored data.
So now we are going to configure this plugin, use this builder, and also connect it to MongoDB Atlas as our memory store.
Paste the following code inside your
Main
method:The Memory Builder comes with some helper methods. In this case, we are using
WithOpenAITextEmbeddingGeneration
which helps you configure the memory plugin.Because we are working with text in this project, we need to be able to generate text embeddings for our data to be used in the search. This is where OpenAI comes in. By passing this method the name of the model we want to use and the OpenAI API key, the plugin has all it needs to automatically take care of the rest for us under the hood — excellent!
Ensure the following using statements are present in the file:
Using a database that supports vectors and vector searches, such as MongoDB Atlas, is a key part of adding the retrieval and augmentation parts to your RAG applications.
Semantic Kernel’s MongoDB Connector adds support for not only using MongoDB as your data store for your embeddings, but it also uses MongoDB’s vector search capabilities to carry out the search.
Paste the following code after the previous, inside your
Main
method:Just like that, with a few lines of code, we have the memory plugin set up and it is configured to use MongoDB.
MongoDB’s sample data comes with different databases and collections for a variety of use cases. One of the recent changes was to the sample_mflix database. This database has been around in the sample data for a long time but we recently added a new collection inside the database called embedded_movies. You may have noticed that already if you have browsed your new cluster. This collection contains vector embeddings on the plot field from a large number of documents from the movies collection and makes it much easier for developers to experience MongoDB’s Atlas Vector Search in a variety of programming languages.
In an ideal world, we would use this collection with Semantic Kernel. Unfortunately, there is a limitation with Semantic Kernel on the name of the field containing the embeddings value as well as the shape of the documents it can use. So for this reason, for the sake of this tutorial, we are going to import some documents from our sample_mflix database and save them in a new collection, using Semantic Kernel. This will generate the embeddings automatically using OpenAI, and save them in the format that Semantic Kernel can use later.
First, we need to create a model that represents the movie document. So create a new
Movie.cs
class in your project and paste in the following:If your IDE or text editor doesn’t auto add the required using statements, add the following at the top of the class:
Now we have the model available that reflects our document, it is time to make use of it.
Paste the following code in your
Program.cs
class:Let’s take a look at what is happening:
- We take advantage of the MongoDB C# driver, which is available to us from the connector, to create a new client and point it to our existing database and collection.
- Then, we create a new list of movies, fetching the requested number of documents and adding them to the list.
- For each movie, we do some data hygiene for any null plots as this can cause errors later, and simply marking it as nullable won’t work, sadly.
- After we have a clean list of movies, we iterate through each one and save it to our new collection via the memory store.
- The document that Semantic Kernel creates with the plugin has some fields that we want to populate so we assign those the most sensible values from the fields available in our movie document.
Now, we need to actually call this method. We can do this by simply calling
await FetchAndSaveMovieDocuments(memory, 1500);
from our Main
method, after the existing code. This will populate our collection linked to the memory store with 1500 documents. You can choose a different number, if you wish.Run the application to populate our new database and collection with data using Semantic Kernel. Once it displays “Fetching documents from MongoDB…”, wait a few minutes for it to populate in the background and then close the application. Generating the text embeddings on such a large number of documents using Semantic Kernel can take a little while. This is not a bottleneck due to the wonderful MongoDB C# driver.
dotnet run
This only needs to run once so we have some data available to us. So if you want to run this app again in future, it is OK to comment out the call to the method
FetchAndSaveMovieDocuments
, or remove it completely.This will create a new database in your cluster called semantic-kernel with a collection called embedded_movies, containing the data as populated using Semantic Kernel.
You may have noticed earlier that when we added our MongoDB memory store, we passed it the search index name. This search index is used to identify which field or fields we want to use in our search. But this doesn’t exist yet on our MongoDB database.
Now you have run the application once, the data will be available in the collection to use in the search index.
We already have some great documentation on how to create a vector search index so you can refer to that on how to access the wizard in the Atlas UI to create the new index.
The following JSON can be used to define the index:
This uses the embedding field that was generated by Semantic Kernel. OpenAI’s “text-embedding-ada-002” model that we are using for the text embedding generates 1536 dimensions. You will see this in the documents generated as the embedding array contains 1536 elements.
You will need to use the index name “default” to match the hard coded variable in your code. If you name the search index something else, be sure to update the variable.
Now that we have the data available to us and the search index created, it is time to add the ability to actually ask questions of our data.
Paste the following code inside your
Main
method, after the existing code:A lot of this code is about user input and formatting the output. But let’s look at the lines of code that matter:
memory.SearchAsync
is how we carry out the search. We pass it the name of where we want to search, a.k.a. the collection name, what we want to search, how many results to get back, and what score from 0 to 1 we consider a threshold for “relevant enough.”
await foreach (var mem in memories)
is slightly different to the foreach you might be used to.
The memories variable that was assigned the result of the search is of type ```IAsyncEnumerable so we have to perform an await foreach to iterate through it.We have everything in place now to run the application and actually ask it a question. Why not try asking it for a movie about sharks or another topic you love?
Just like that, you have created a simple movie chat recommendation bot using Semantic Kernel from Microsoft, MongoDB Atlas, and the awesome connector for MongoDB in Semantic Kernel.
If you want to learn more, I wrote a tutorial on how to use Atlas Vector Search natively in a .NET application!
There is also a main branch of this repo which uses AzureOpenAI for those of you who have access.
Why not try it out today and see what movie you might want to watch tonight?
Top Comments in Forums
There are no comments on this article yet.