Vector search is not really semantic

Prakul_Agarwal · January 31, 2024, 7:25am

Hello @Maor_Bolokan , With vector search what you are performing is an Approximate k nearest neighbors search. When you search for “Rooms that have air-condition” the vector search will return the k ‘nearest’ documents to your query (here the parameter k is what you specified as limit in your $vectorSearch syntax), within all the documents that are in your vector search index.
The vector search query will return the top k documents that match your query, and these results are ‘approximate’. Suppose you had only k docs in your index, the vector search query will return those k docs. But these returned results come with a ‘similarity score’.

How many documents have you embedded, ie what is the size of your vector search index? More the documents that you have in your index the better.
You can check the semantic similarity score between your query and the returned results using $project with “score”: { “$meta”: “vectorSearchScore” } in your agg pipeline. more here
Higher the score the better, and you will want to discard the results with a score lower than a threshold than you would choose

 db.<collection>.aggregate([
 {
     "$vectorSearch": {
       <query-syntax>
     }
   },
   {
     "$project": {
       "<field-to-include>": 1,
       "<field-to-exclude>": 0,
       "score": { "$meta": "vectorSearchScore" }
     }
   }
 ])

Let me know if this helps clarify your issue.