How to apply filters on MongoDBAtlasVectorSearch similarity_search_with_score” of langchain?

I am using MongoDBAtlasVectorSearch and ì want to search for the most similar documents so I use the function similarity_search_with_score.

However, it seems like I am not able to add filters in this similarity_search_with_score function.

This is my code:

vector_search = MongoDBAtlasVectorSearch(
        collection=client[os.getenv("MONGODB_DB")]["files"],
        embedding=embeddings,
        index_name=os.getenv("ATLAS_VECTOR_SEARCH_INDEX_NAME"),
    )

results = vector_search.similarity_search_with_score(
        query="What are the engagements of the company",
        k=5,
        pre_filter={
            "compound": {
                "filter": [
                    {"equals": {"path": "uploaded_by", "value": chat_owner}},
                    {"in": {"path": "file_name", "values": file_names}},
                ]
            }
        },
    ) 

This is my index:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "embedding": {
        "dimensions": 1536,
        "similarity": "cosine",
        "type": "knnVector"
      },
      "file_name": {
        "normalizer": "lowercase",
        "type": "token"
      },
      "uploaded_by": {
        "normalizer": "lowercase",
        "type": "token"
      }
    }
  }
}

However, this gives me the following error :

pymongo.errors.OperationFailure: "knnBeta.filter.compound.filter[1].in.value" is required, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter.compound.filter[1].in.value" is required', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704804627, 1), 'signature': {'hash': b'\xfa\x15s+Q\x1d\xa86]R\xb2!\x9d\xc5b-G\xce\xa6S', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704804627, 1)}

I also tried like this :

        pre_filter={
            "$and": [
                {"uploaded_by": {"$eq": chat_owner}},
                {"file_name": {"$in": file_names}},
            ]
        },

But I got this error:

pymongo.errors.OperationFailure: "knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present, full error: {'ok': 0.0, 'errmsg': '"knnBeta.filter" one of [autocomplete, compound, embeddedDocument, equals, exists, geoShape, geoWithin, in, knnBeta, moreLikeThis, near, phrase, queryString, range, regex, search, span, term, text, wildcard] must be present', 'code': 8, 'codeName': 'UnknownError', '$clusterTime': {'clusterTime': Timestamp(1704802325, 9), 'signature': {'hash': b'`\xd27-\x81+\x16\xd0a\x14\xc7\x99\xa8\x05|Sx?\x0e:', 'keyId': 7283272637088792583}}, 'operationTime': Timestamp(1704802325, 9)}
WARNING:  StatReload detected changes in 'src/routes/chats/chats.py'. Reloading...

How can I use filters in the similarity_search_with_score properly ?

Hello @Victor_Vilde , very sorry for the delay here.

Have you been able to get this working? I believe there is a mix of things happening here between your examples.

If you are on the most recent version of langchain you should be able to use the pre_filter as you’ve defined in your second example with the MQL Filtering (e.g. “$and”) but the error indicating “knnBeta” suggests this is an old version of langchain and is not using $vectorSearch which comes with the MQL Filtering.