$vectorSearch filter & numCandidates

I’m currently running a $vectorSearch aggregation and using the filter property to pre-filter all of the documents. The filter I’m using works; everything is set up properly.

However, one thing that confuses me is that unless I set numCandidates to a value >= 150, no documents get returned by the aggregation. I have limit set to 1, but the same happens for any limit that is set. When numCandidates is >= 150, the returned documents match the filter I specified.

What confuses me is my lack of understanding. I can’t find anything in the documentation that mentions needing to set or tweak numCandidates based on the pre-filter being used. Is there any information regarding this?

I’m sure it’s not the case, but it seems like the “pre-filtering” is being applied after finding candidate documents.

Thanks!

Hi @Charlie_Mattox-

Thanks for the question, I can definitely see why this behavior can be confusing. numCandidates controls the size of the priority queue of vectors to be assessed when traversing the HNSW graph before returning limit documents. In the event when you are using a filter, it has a secondary purpose which is to assess whether or not to do traversal at all, or do exhaustive ENN on the few objects that meet the prefilter, fewer than numCandidates. What makes this even more complicated is that this is assessed on a per-segment basis, so even knowing that 80 documents meets a prefilter from a $match query doesn’t mean that numCandidates = 81 is the point at which an ENN would occur, likely its below that level based on how many segments are produced, which is not exposed to the user.

This means that there could be cases when a lower numCandidates doesn’t trigger ANN, and the top resulting documents don’t meet the prefilter, but when you bump up numCandidates they do. In general we would recommend to increase numCandidates if you are not seeing expected results, which it appears you have already done.

In the meantime I will work to make sure this behavior is more clearly documented. Hopefully that resolves some of your confusion

Hey, thanks so much for your response/explanation, makes sense!