$vectorSearch filtering by ObjectId

Hi all,
I’ve created a collection for embeddings related to chunks of user documents so I can search for user-specific relevant parts.
Our current aggregation first stage looks like this:

  $search: {
    index: 'vector-search',
    knnBeta: {
      vector: queryEmbeddings,
      path: 'embedding',
      filter: {
        equals: {
          value: new ObjectId(userId),
          path: 'userId',
    },
      k: 200,
    },
  },

But now that $vectorSearch is GA, we wanted to switch to the new index but I can’t figure out how to filter by ObjectId. Here’s the what I have right now:

  $vectorSearch: {
      index: 'vector_index',
      path: 'embedding',
      // filter,
      queryVector: queryEmbeddings,
      numCandidates: 200,
      limit: 10,
  },

The documentation seems to imply that you can only filter by boolean, number or string Vector Search Fields.
What’s the workaround to make this work? Do I need to set numCandidates to 200 (or higher) and limit at 200 and add $match stage after that to filter by userId? Or is the only way to convert all userId from ObjectId to string in that collection?

Thanks!

Using a $match stage after the vectorSearch will cause post-filtering, ie you could end up getting less no of results than the intended k (or limit). To perform a pre-filtering (recommended) with the new $vectorSearch you would have to covert userID from ObjectID to String

To continue using objectID as a pre-filter you would have to stick to using $search

1 Like

Ok. Thanks a lot for the reply

I have the same issue. But to clearify. Are you saying I have to either add a new field to my docs in my collection that is the string of my users ObjectId ?

You wrote that its possible to use $search to get around this. Please explain how to do that exactly.

Or is there any other way in solving this. I mean, this has to be something that pretty much everyone will bump into. Having something like a user field with a ObjectID refering the a users collection.

It would be a bit frustrating if I would have to duplicate the user field and have userString field.

What I had originally posted is what we use in production with the $search keyword for vector search aggregation queries:

$search: {
    index: 'vector-search',
    knnBeta: {
      vector: queryEmbeddings,
      path: 'embedding',
      filter: {
        equals: {
          value: new ObjectId(userId),
          path: 'userId',
    },
      k: 200,
    },
  },

The main drawbacks that I noticed while using this vs $vectorSearch keyword is that:

  1. Results are not ordered by score so we had to add extra aggregation stages to get the score { $meta: 'searchScore' } and then sort ascending by the score before we could limit to whatever number of documents we needed to return
  2. Because they’re not ordered, in some cases, the k field value had to be bigger than recommended so we would get all the relevant documents back
    From the tests we’ve done using $vectorSearch keyword, results seemed to be ordered by score { $meta: 'vectorSearchScore' } so that removes the need for some aggregation stages