Query performance using $in operator maching more than 600 values

Valeria_Haro1 · June 22, 2023, 2:29pm

My query has the next pattern:


"aggregate": "collection",
    "pipeline": [
      {
        "$match": {
          "story_dt": {"$gte": {"$date": XXX},"$lt": { "$date": XXX } },
          "org_id": {
            "$in": [
     (HERE WE SENT 600 NUMBERS)234,2345,86,292,456, etc
                        ]
          },
          "topics.topic_cd": {
            "$in": [
              "Water",
              "WatPoll"
            ]
          }
        }
      }

The index got is org_id but the query is slow, how could we improve the performance?
what is the limit of values to match using $in operator?

Kushagra_Kesav · July 25, 2023, 6:14am

Hi @Valeria_Haro1,

Welcome to the MongoDB Community!

I’ve some thoughts that may help optimize this aggregation pipeline query:

First, the $in operator on org_id is currently not able to take advantage of the index on org_id since it is checking for equality against a large array of values. To use the index more efficiently, you could replace the $in with an $or condition checking for equality against each ID individually.

     "$or": [
          {"org_id": 234},
          {"org_id": 2345}, 
          {"org_id": 86},
          {"org_id": 292},
          {"org_id": 456},
          ...
        ],

However, this may not be feasible if the number of IDs is very large.

Further, you can consider adding an index on {story_dt: 1, org_id: 1, 'topics.topic_cd': 1} to support the full query criteria, not just org_id. This is a compound index spanning all fields used in the $match.
There is no hard-coded limit on the number of values in $in. But in general, very large $in lists with hundreds or thousands of items will negatively impact query performance.

Let us know if this helps or if you have any further questions!

Regards,
Kushagra

system · August 1, 2023, 11:59am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.