Vector Search using C# Driver

Marmik_Shah · January 19, 2024, 3:21pm

We have been using C#.Net Driver. Is there any plan to incorporate Vector Search in C#.Net driver ?

Also while querying Index which is Vectorized we need to pass the query which is vector based. For that we need to go and get the embedding done by Hugging face model. Is there support for that in driver or we need to do that outside the driver ?

Basically query itself has to be converted into embeddings. What is recommended way to do that through C#.Net ?

Thanks

Benjamin_Flast · January 28, 2024, 7:25pm

Hello @Marmik_Shah can you elaborate a bit on what you mean by incorporate Vector Search in the C# driver? Are you referring to the helper classes in the C# driver that make it easier to build aggregation pipelines?

We do not have specific support in the drivers to embed your queries, your application would need to send the content to the embedding endpoint first, and then build the query and submit with the driver.

Marmik_Shah · February 5, 2024, 2:38am

My understanding is I need to use same embeddings as the embeddings used by search index. How would I know which algorithm is used by vector search index?

Is there good documentation/how to around how can I start using Vector Search through C# ?

Thanks

Prakul_Agarwal · February 5, 2024, 4:51pm

The embeddings used in search index and query are defined by the user. You define the embedding dimension of your chosen embedding model in the Search index definition. More steps here: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview/

The Atlas Vector Search index uses an underlying ANN algorithm (HNSW) for doing the approx search for k nearest neighbors among the indexed docs, for the user given query.

Prakul_Agarwal · February 5, 2024, 5:52pm

you can find the C# code example : by set the **Select your language** drop-down menu to set the language of the examples in this page. on https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#std-label-vectorSearch-agg-pipeline-filter
`

using System.Reflection.Emit;
using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes;
using MongoDB.Bson.Serialization.Conventions;
using MongoDB.Driver;
using MongoDB.Driver.Search;
public class vectorSearchFilterQuery 
{
  // define connection to your Atlas cluster
  private const string MongoConnectionString = "<connection-string>";
  public static void Main(string[] args){
    var camelCaseConvention = new ConventionPack { new CamelCaseElementNameConvention() };
    ConventionRegistry.Register("CamelCase", camelCaseConvention, type => true);
    // connect to your Atlas cluster
    var mongoClient = new MongoClient(MongoConnectionString);
    // define namespace
    var moviesDatabase = mongoClient.GetDatabase("sample_mflix");
    var moviesCollection = moviesDatabase.GetCollection<EmbeddedMovie>("embedded_movies");
    // define vector embeddings to search
    var vector = new[] {0.02421053,...};
    // define filter
    var yearGtFilter = Builders<EmbeddedMovie>.Filter.Gt("year", 1955);
    var yearLtFilter = Builders<EmbeddedMovie>.Filter.Lt("year", 1975);
    // define options 
    var options = new VectorSearchOptions<EmbeddedMovie>() {
        Filter = Builders<EmbeddedMovie>.Filter.And(yearGtFilter, yearLtFilter),
        IndexName = "vector_index",
        NumberOfCandidates = 150
    };
    // run query
    var results = moviesCollection.Aggregate()
                .VectorSearch(m => m.Embedding, vector, 10, options)
                .Project(Builders<EmbeddedMovie>.Projection
                  .Include(m => m.Title)
                  .Include(movie => movie.Plot)
                  .Include(movie => movie.Year))
                .ToList();
    // print results
    foreach (var movie in results)
      {
        Console.WriteLine(movie.ToJson());
      }
  }
}
[BsonIgnoreExtraElements]
public class EmbeddedMovie
{
    [BsonIgnoreIfDefault]
    public string Title { get; set; }
    public string Plot { get; set; }
    public int Year { get; set; }
    [BsonElement("plot_embedding")]
    public double[] Embedding { get; set; }
}

Marmik_Shah · February 5, 2024, 6:24pm

   // define vector embeddings to search
    var vector = new[] {0.02421053,...};

How do you generate embeddings?

Which end point I have to use to generate the embeddings?

Prakul_Agarwal · February 5, 2024, 7:07pm

You will generate embeddings using an API call to an embedding model providers like OpenAI, Cohere, or any open source models from a hub like huggingFace.
OpenAI text embedding is a good place to get started:

curl https://api.openai.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-3-small"
  }'

Here’s a python code sample of doing that:

openai.api_key = os.getenv("OPENAI_API_KEY")

model = "text-embedding-ada-002"

def generate_embedding(text: str) -> list[float]:
	resp = openai.Embedding.create(
		input=[text], 
		model=model)

	return resp["data"][0]["embedding"]

(taken from Building Generative AI Applications Using MongoDB: Harnessing the Power of Atlas Vector Search and Open Source Models | MongoDB)

Marmik_Shah · February 5, 2024, 8:07pm

What embedding model is used by Mongo Atlas behind the scenes when we generate vector-based index ?

Don’t I have to use the same embedding model (model = “text-embedding-ada-002”
) to align with vector based index ?

Does Vector Index allow to use/configure which model to use as part of creating vector search index?

Thanks

Prakul_Agarwal · February 6, 2024, 4:54pm

What embedding model is used by Mongo Atlas behind the scenes when we generate vector-based index ?

There is no default embedding model used by Atlas Vector. You create your own vectors and store it in MongoDB Atlas Vector Search.

Don’t I have to use the same embedding model (model = “text-embedding-ada-002”
) to align with vector based index ?

yes the same embedding model should be used for initial data insertion and the querying

Does Vector Index allow to use/configure which model to use as part of creating vector search index?

No. you bring your own vectors