Hi all,
I’ve created a collection for embeddings related to chunks of user documents so I can search for user-specific relevant parts.
Our current aggregation first stage looks like this:
$search: {
index: 'vector-search',
knnBeta: {
vector: queryEmbeddings,
path: 'embedding',
filter: {
equals: {
value: new ObjectId(userId),
path: 'userId',
},
k: 200,
},
},
But now that $vectorSearch is GA, we wanted to switch to the new index but I can’t figure out how to filter by ObjectId. Here’s the what I have right now:
$vectorSearch: {
index: 'vector_index',
path: 'embedding',
// filter,
queryVector: queryEmbeddings,
numCandidates: 200,
limit: 10,
},
The documentation seems to imply that you can only filter by boolean, number or string Vector Search Fields.
What’s the workaround to make this work? Do I need to set numCandidates to 200 (or higher) and limit at 200 and add $match stage after that to filter by userId? Or is the only way to convert all userId from ObjectId to string in that collection?
Thanks!
Using a $match stage after the vectorSearch will cause post-filtering, ie you could end up getting less no of results than the intended k (or limit). To perform a pre-filtering (recommended) with the new $vectorSearch you would have to covert userID from ObjectID to String
To continue using objectID as a pre-filter you would have to stick to using $search
1 Like
Ok. Thanks a lot for the reply
I have the same issue. But to clearify. Are you saying I have to either add a new field to my docs in my collection that is the string of my users ObjectId ?
You wrote that its possible to use $search to get around this. Please explain how to do that exactly.
Or is there any other way in solving this. I mean, this has to be something that pretty much everyone will bump into. Having something like a user field with a ObjectID refering the a users collection.
It would be a bit frustrating if I would have to duplicate the user field and have userString field.
What I had originally posted is what we use in production with the $search keyword for vector search aggregation queries:
$search: {
index: 'vector-search',
knnBeta: {
vector: queryEmbeddings,
path: 'embedding',
filter: {
equals: {
value: new ObjectId(userId),
path: 'userId',
},
k: 200,
},
},
The main drawbacks that I noticed while using this vs $vectorSearch keyword is that:
- Results are not ordered by score so we had to add extra aggregation stages to get the score
{ $meta: 'searchScore' } and then sort ascending by the score before we could limit to whatever number of documents we needed to return
- Because they’re not ordered, in some cases, the k field value had to be bigger than recommended so we would get all the relevant documents back
From the tests we’ve done using $vectorSearch keyword, results seemed to be ordered by score { $meta: 'vectorSearchScore' } so that removes the need for some aggregation stages