Order of values in an $in and $all array changes performance.

My understanding is that the order of values in an $in and $all clause shouldn’t change the performance of a query. However, that’s not the behavior I’m seeing.

Environment: 3 shard cluster. Mongo version 7.0.7. M600 instance size.
Collection: 3.4 Billion documents
Index: chainId_1_parties_1_timestamp_1

Query:

{
  chainId: {
    $in: [1, 8453, 7777777, 10],
  },
  parties: {$all: ["0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48", "0x74b78e98093f5b522a7ebdac3b994641ca7c2b20"]}
}

The above times out when querying / trying to explain in Mongo Compass. However,

{
  chainId: {
    $in: [1, 8453, 7777777, 10],
  },
  parties: {$all: ["0x74b78e98093f5b522a7ebdac3b994641ca7c2b20", "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"]}
}

Is fast.

0x74b78e98093f5b522a7ebdac3b994641ca7c2b20 has about 4,000 total docs where it exists in parties while 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 has millions.

It seems like the first query where 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 is the first element in the $all uses the index to scan all 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 docs to find matches where 0x74b78e98093f5b522a7ebdac3b994641ca7c2b20 is also in parties.

Is this expected behavior?

Hi Jordan,

Yes you are correct, $all operator works like a chained $and statements where the order matters. So if 0x74b78e98093f5b522a7ebdac3b994641ca7c2b20 has only 4000 documents then its better to pass it first in the array as then MongoDB will later operate on the scanned 4000 documents, but in case you pass 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48 it will scan millions of documents and then pass it to the next value in the array. Hence the reason why you see a huge time difference in the query execution time. You can read more about $all operator from here - https://www.mongodb.com/docs/manual/reference/operator/query/all/

1 Like