How to filter vector searches by another attribute?

Steve_Howard · October 19, 2023, 1:51pm

I am struggling to come up with a way to filter for a given document, rather than all. I found an especially helpful example for $vectorSearch using a movies database on the Mongo blog (Building Generative AI Applications Using MongoDB: Harnessing the Power of Atlas Vector Search and Open Source Models | MongoDB), but I am looking for an example in which a list of product recommendations based on a customers past purchases. My thought is the flow will be:

Take each purchase and create an embedding for it (name, category, description)
Store each embedding along with the purchase
Create a knnVector search index on the embedding column in the purchases collection, which also has the customer data (id, name, etc.)
Take each product and create an embedding for it (name, category, description)

When I want to generate a list of similar products based on past purchases is where I am stuck. I am assuming I can use a $eq on customer email, or whatever. Let’s say a customer has three purchases. What I can’t wrap my brain around is whether I should be:

creating a concatenated list of the three purchases and embeddings for each and looping through these and looking for similar matches in all the products embeddings (this seems inefficient)
storing each of the three purchases separately and using the $eq aggregation on the customer email.

The example for the vector search has:

result = client['sample_mflix']['movies'].aggregate([
  { '$vectorSearch': {
       'queryVector': embed,
       'path': 'plot_embedding',
       'numCandidates': 100,
       'limit': 5,
       'index': 'sampleindex'
     }
  }
])

If I plug in the $eq operator, will this meet what I need? I am testing concurrently with this post, so I amy find out on my own, but I wanted to float this as I am most likely making it too hard

Steve_Howard · October 19, 2023, 3:52pm

I figured this one out using this:

This brings me back to the design question. Is there a way to pass an embedded document of purchases to vector search, or if I have multiple customer orders, will a filter on the customer ID, email, etc. be the filter? That should return a list of products with multiple embeddings for products and multiple orders for the customer, correct?

Is that efficient, or is there something I am not considering?

Steve_Howard · October 19, 2023, 9:00pm

I think I have a gap in design. I have a skus collection for which I have created vector embeddings, as well as the same attributes and vector embeddings for a customer orders collection. I am picturing looping through the customer ID’s and doing a vector search for each vector embedding for that customer against the skus collection and its embedding. That seems inefficient.

Is there a pattern for what I am looking to do?

Aasawari · November 2, 2023, 9:49am

Hi @Steve_Howard and welcome to MongoDB community forums!!

As mentioned in the documentations for embeddedDocuments, you can’t define a field inside the embeddedDocuments type as the knnVector type.

The recommendation would be to rethink on the database design to use the knn search algorithms. It would be beneficial if you could share your current design model so that we could help you further.

Regards
Aasawari