I have the following pipeline that i’m trying to execute, but it isn’t working.
My idea: I want to filter on certain field (userId), which works correctly, then want to further filter on the fileName field, where the fileName (string) must be in the model_documents (string) list. I can’t find documentation on how to properly use the in
clause for the knnBeta filter.
Here is the error I am recieving:
pymongo.errors.OperationFailure: This analyzer is expected to produce exactly one token, but got many, full error: {'ok': 0.0, 'errmsg': 'This analyzer is expected to produce exactly one token, but got many', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1703635448, 7), 'signature': {'hash': b'R\xff\xe5jJw\xb1\xca \xf5;\x1b\x97A\xbbt\xaf\xa2\xaf^', 'keyId': 7274371884303515650}}, 'operationTime': Timestamp(1703635448, 7)}
similar_docs = document_collection.aggregate([
{
"$search": {
"index": 'default',
"knnBeta": {
"vector": input_embedding,
"path": 'embedding',
"k": top_k,
"filter": {
"compound": {
"must": {
"text": {
"path": "userId",
"query": user_id
}
},
"must": [
{
"in": {
"path": "fileName",
"value": model_documents,
}
}
]
}
}
}
}
}
])
Would appreciate any help forming this correclty, thanks.