How to apply filters to function "similaritySearchWithScore" of langchain?

I am using mongodb atlas search, my following code working fine, but i am unable to add filters, I see some examples but that are using raw aggregation, like in langchain wrapper, there is function(i.e. similaritySearchWithScore(qurey, 5,{preFilter:{name:“test_file.pptx”}})) which support filters as args. I used this but it gives error i.e.
error: PlanExecutor error during aggregation :: caused by :: “filter.name” must be a document
Code:-

import { MongoDBAtlasVectorSearch } from "langchain/vectorstores/mongodb_atlas";
const store = new MongoDBAtlasVectorSearch(embeddings, { "collection": my_collection, "indexName": "default", "textKey": "page_content", "embeddingKey":"page_embeddings"})
return await store.similaritySearchWithScore(qurey, 5,{preFilter:{name:"test.pptx"}})

How we can pass filters here??

Thanks

2 Likes

Hey @Davinder_Singh3,

Welcome to the MongoDB Community forums!

As per the $vectorSearch - documentation, the filter needs to be of document type and you can use the $eq operator here to resolve the error. Further, please refer to the Atlas Vector Search Pre-Filter to read more about it.

In case you have further concerns, feel free to reach out.

Best regards,
Kushagra

@Kushagra_Kesav now i have following 2 queries

  1. Do we need to add all fields to following template?? which we want to filter in semantic search function, since i filtered one field which was not added to indexing and it throw given error.
  2. If yes what “type” is for field of collection which has JSON data type?? i.e. for string we “type”: “token”, “normalizer”: “lowercase”.
{
    "mappings": {
        "dynamic": true,
        "fields": {
            "page_embeddings": {
                "dimensions": 1536,
                "similarity": "cosine",
                "type": "knnVector"
            }
        }
    }
}```

Error:-
****error: PlanExecutor error during aggregation :: caused by :: Path 'name' needs to be indexed as token****

Hi @Davinder_Singh3,

May I ask what you meant by ‘all fields’ here? Are you generating vector embeddings for multiple fields?

Yes, the $vectorSearch filter option matches only BSON boolean, string, and numeric values so you must index the fields as one of the following Atlas Search field types.

And, yes for the string - index a field as token type. Atlas Search indexes the terms in the string as a single token (searchable term) and stores them in a columnar storage format for efficient filtering or sorting operations. To read more about it, please refer to the Behavior of the token Type - MongoDB Docs.

Best regards,
Kushagra

@Kushagra_Kesav Should we create separate index for the filter fields? Or should we mention in the vector index template?

For example, I want to apply vector search, lets say only for documents where country=‘IN’, should I create a separate index for country or should I add this field in vector index mapping?

Thanks for clarifying @Kushagra_Kesav .

  1. No, there is only one field which has embeddings, but we have case that we need to filter results based on another fileds(non embeddings). i.e.
await store.similaritySearchWithScore(qurey, 5, { preFilter: { $and: [{ name: { $eq: "test" }, "document_meta.Disclaimer.Label": { $eq: "Client Ready" } }] } })
  1. we have index like i.e.
{
    "mappings": {
        "dynamic": true,
        "fields": {
            "page_embeddings": {
                "dimensions": 1536,
                "similarity": "cosine",
                "type": "knnVector"
            },
            "name": {
                "type": "token",
                "normalizer": "lowercase"
            },
            "document_meta.Disclaimer.Label": {
                "type": "token",
                "normalizer": "lowercase"
            }
        }
    }
}

It works well on “Name” field but on “document_meta.Disclaimer.Label” it just return empty i.e

return await store.similaritySearchWithScore(qurey, 5,{ preFilter: { "document_meta.Disclaimer.Label": { $eq: "Client Ready" } } })
  1. And on “$or”/“$and” operator following error is happening i.e
await store.similaritySearchWithScore(qurey, 5, { preFilter: { $and: [{ name: { $eq: "test" }, "document_meta.Disclaimer.Label": { $eq: "Client Ready" } }] } })

It throw error as error: PlanExecutor error during aggregation :: caused by :: “filter.$and[0]” more than 1 filter

Need your inputs/guidance on 2 and 3 points. We have case where along with query, we want results filtered by multiple fields.

Thanks,
Davinder

Hey @Davinder_Singh3,

Thank you for clarifying your use case.

Looking at one of the queries you’ve shared, it appears that one curly parenthesis is missing in the query, and there seems to be a misspelling in the word ‘query’.

The correct one will be:

await store.similaritySearchWithScore(query, 5, {
  preFilter: {
    $and: [
      { "name": { $eq: "test" } },
      { "document_meta.Disclaimer.Label": { $eq: "Client Ready" } }
    ]
  }
})

For further details, you can refer to https://api.js.langchain.com/classes/vectorstores_mongodb_atlas.MongoDBAtlasVectorSearch.html#similaritySearchWithScore API documentation.

Additionally, in another case where you are getting an empty result, please ensure that a document exists with the value "document_meta.Disclaimer.Label": "Client Ready". This might be causing the absence of expected outcomes.

Best regards,
Kushagra

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.