Thanks for providing those details @Guy_Machat,
As a note for future posts, it would be easier for users (including myself) to have copy and paste-able documents (and any code snippets) in valid format with the values you’re experiencing the behaviour described to help with the troubleshooting. In saying so, I have tested with those documents but had to guess the array values although I believe you may be receiving nothing in return possibly due to the index definition.
Can you try with the following? You may need to wait a few minutes after saving the changes to run the $search
query (you might need to altert the dimensions
value as i’ve changed this to match the documents in my test environment):
{
"mappings": {
"fields": {
"embeddings": [
{
"dimensions": 4,
"similarity": "cosine",
"type": "knnVector"
}
],
"item_id": {
"type": "string"
}
}
}
}
For reference, in my test environment with the below sample documents:
db.vectors.find({},{_id:0})
[
{
item_id: '9f41e31c-882f-42ef-add4-18688e810e01',
embeddings: [ -0.01, -0.02, -0.03, -0.04 ],
text: 'foo'
},
{
item_id: '6f539716-00f0-42ea-b4af-fdf1db09183e',
embeddings: [ -0.01, -0.02, -0.03, -0.04 ],
text: 'foo'
},
{
item_id: '9f41e31c-882f-42ef-add4-18688e810e01',
embeddings: [ -0.011, -0.021, -0.031, -0.041 ],
text: 'fooo'
}
]
I was able to run the following knnBeta
$search
with a filter
on "item_id"
to return documents 1 and 3 as you have mentioned:
db.vectors.aggregate({
'$search': {
'index': 'default',
'knnBeta': {
'vector': [-0.01,-0.02,-0.03,-0.04],
'path': 'embeddings',
'k': 2,
'filter': {
'text': {
'path': 'item_id',
'query': '9f41e31c-882f-42ef-add4-18688e810e01'
}
}
}
}
})
[
{
_id: ObjectId("64d18ff706683323f56ba731"),
item_id: '9f41e31c-882f-42ef-add4-18688e810e01',
embeddings: [ -0.01, -0.02, -0.03, -0.04 ],
text: 'foo'
},
{
_id: ObjectId("64d18ff706683323f56ba733"),
item_id: '9f41e31c-882f-42ef-add4-18688e810e01',
embeddings: [ -0.011, -0.021, -0.031, -0.041 ],
text: 'fooo'
}
]
If you’re still running into issues with the filter
can you share the documents (redacting any sensitive information) as well as the $search
stage and index definition? I assume the index definition will probably differ each time with testing which is why I am requesting for it again if further help is required.
Look forward to hearing from you.
Regards,
Jason