I’m currently using MongoDB’s $search
with a knnBeta
pipeline for a k-nearest neighbours search to retrieve the 10 most similar text documents based on their egVector
field. Then, I apply a $match
pipeline to filter the texts by a specific file_name
, “Test.txt”, and finally a $project
pipeline to return the information that I need. Here’s my current query:
let text = await Text.aggregate([
{
$search: {
index: "default",
knnBeta: {
vector: resp.data.data[0].embedding,
path: "egVector",
k: 10,
},
},
},
{
$match: {
file_name: "Test.txt",
},
},
{
$project: {
egVector: 0,
},
},
])
The issue I’m running into is that if the “Test.txt” document isn’t a part of the initial 10 documents retrieved by $search
, it’s not considered in my query, even when it might exist in my database. This situation occurs when “Test.txt” would be part of the top-k returned documents if I were to run the query with a larger k
parameter (like k=20
). However, I’m only interested in getting the top 10 results for this specific file name. As such, I’m trying to figure out how I can apply a $match
filter on file_name
before running $search
, so that I consider only the documents where file_name
equals “Test.txt”. However, I have found out that $search
needs to be the first operator in a MongoDB aggregation pipeline with the Full-Text Search feature. Given this, how can I modify my query so that I return the top 10 most similar documents (based on their egVector
field) where file_name
is equal to “Test.txt”? Is there an alternative approach to this problem? Any help would be much appreciated!