Higher number of lists lead to missed results

Nataliia_Obraztsova · March 12, 2024, 8:42am

Hello,
I have a collection with more than 50 000 docs each associated with a corresponding vector. I want to optimize similarity search but when I create an index with higher number of lists the similarity search fails to return certain vectors, even though they are present in the database.

Is there a way to optimize search speed while ensuring that vectors present in the database will be returned?

I create an index for the collection with following settings:

"indexes": [
                    {
                        "name": "vector_index",
                        "key": {"vectorContent": "cosmosSearch"},
                        "cosmosSearchOptions": {
                            "kind": "vector-ivf",
                            "numLists": 52,
                            "similarity": CosmosDBSimilarityType.COS,
                            "dimensions": 1536,
                        },
                    }
                ],