Llama index mongodb filters support for multiple values for a key

MongoDB vectore store does not support mutiple value filter on same key. However, this seems to be a very important feature.

llama_index.vector_stores.mongodb.base.py

def _to_mongodb_filter(standard_filters: MetadataFilters) -> Dict:
    """Convert from standard dataclass to filter dict."""
    filters = {}
    for filter in standard_filters.legacy_filters():
        filters[filter.key] = filter.value  # <- here, this does not make multiple values on key not applied
    return filters

I want to use filters like below.

filters = MetadataFilters(
    filters=[
        ExactMatchFilter(key="metadata.team", value="TEAM1"),   
        ExactMatchFilter(key="metadata.team", value="TEAM2"), # <- multiple values on key "metadata.team"
    ],
    condition=FilterCondition.OR,
)
 # MongoDBAtlasVectorSearch
query_engine = index.as_query_engine(
    filters=filters,
)

I donā€™t know if this is caused by the llama index vector store not supporting the In operator, but Iā€™d like you to look into it.

1 Like

To support multiple values for the same key in the MongoDB vector store when using filters in llama_index, you need to modify the _to_mongodb_filter function to handle cases where multiple filters are applied on the same key. This can be achieved by using MongoDBā€™s $in operator to allow for multiple values for the same key.

Hereā€™s how you can modify the _to_mongodb_filter function to support this:

  1. Modify the _to_mongodb_filter function to support multiple values for the same key:
from collections import defaultdict
from typing import Dict

def _to_mongodb_filter(standard_filters: MetadataFilters) -> Dict:
    """Convert from standard dataclass to filter dict."""
    filters = defaultdict(list)
    
    for filter in standard_filters.legacy_filters():
        filters[filter.key].append(filter.value)
    
    # Convert the defaultdict to a regular dict with $in operator for keys with multiple values
    mongodb_filter = {}
    for key, values in filters.items():
        if len(values) > 1:
            mongodb_filter[key] = {'$in': values}
        else:
            mongodb_filter[key] = values[0]
    
    return mongodb_filter

Youā€™ll need to modify MetadataFilters and ExactMatchFilter to handle multiple values. I think this should accomplish what youā€™re afterā€¦ let me know.

Thanks, But for now, it looks like I need to modify the source code directly to modify _to_mongodb_filter.
So I modified the source code.

I checked and realized that MetadataFilters itself only supports the EQ operator.
Soā€¦ It would be nice to be able to pass a user-defined _to_mongodb_filter function to MongoAtlasVectorStore.

Hi @_zorba! Hi @Michael_Lynn! I cannot speak much about the legacy stuff, but with our recent addition of Hybrid and FullText Search features to the MongoDBAtlasVectorSearch VectorStore, we now handle this as youā€™d expect. You can pass something like the following directly to a VectorStoreQuery as the filters kwarg.

ll_filters = MetadataFilters(
    filters=[
        MetadataFilter(key="year", operator="<=", value=2024),
        MetadataFilter(key="year", operator=FilterOperator.GT, value=2022),
    ]
)


from llama_index.vector_stores.mongodb.pipelines import filters_to_mql

print(filters_to_mql(ll_filters))

You can also call the underlying utility in the pipelines module to see whatā€™s going on:

from llama_index.vector_stores.mongodb.pipelines import filters_to_mql

print(filters_to_mql(ll_filters))

{'$and': [{'metadata.year': {'$lte': 2024}}, ``{'metadata.year': {'$gt': 2022}}]}

I hope that this helps!