Performance of $or vs $in

Hi I am using python with mongodb and came across this question when developing.
Here are two querries and I want to know which one will do better.

Query_1:

collection.find_one_and_update(
{
    "$or": [
            {
                "field1": value,
                "field2": {"gte": constant},
                "field3": constant        
            } for value in acceptable_values_for_field1
        ]
},
{some set operations}
)

as against

Query_2:

collection.find_one_and_update(
{
    "field1":  {"$in": acceptable_values_for_field1},
    "field2": {"gte": constant},
    "field3": constant        
},
{some set operations}
)

NOTE

  1. all fields are indexed her
  2. I know the docs say that “$in” will be better here but compound index scan vs individual index scans might tip the scales in favor of “or”
  3. currently the “acceptable_values_for_field1” have only one element in it, and will probably be the case for a long long time. Do you think “$in” is still better?

Hi Bhavesh,

With a small amount of $or or $in items, you may not notice a difference in performance between the two queries

With a large mount of items in $or or $in (say 10000), and if an index with keys { field3: 1, field1: 1, field2: 1} exists for the $in query, then the $in query will likely outperform $or.

$or will use as many IXSCAN plans (you can check by using .explain()) as there are $or conditions; while the $in query will only have 1 IXSCAN plan.

Compound index scan will likely outperform multiple individual index scans.

You’ll likely not notice a difference here with only one element in “acceptable_values_for_field1”. But $in should be considered over $or when possible.

Best regards,

Raymond

3 Likes

I am doing this in production. As I scale up for more values in acceptable_values_for_field1 will short circuits be better used for for getting faster return on the search? Remember I am just looking for one document back as soon as possible for processing it. The way I have written the $or query could potentially use that. I get that with just one value it wont make much difference but the number of times I am doing this is very high in production so the minor bumps will also addup a lot for me.

Hi Bhavesh,

You can use the following code in mongosh to compare the performance difference in a test environment.
Feel free to tweak the variable step for the number of items in $in or $or

$in

(function() {
    "use strict";
 
    let n = 0;
    let step = 1000;
    let iters = 20;
 
    function setup() {
        db.c.drop();
        db.c.insertOne({_id: 0, a: 1, b: 2})
        db.c.createIndex({a: 1, b: 1, _id: 1})
    }
 
    function remove() {
        let terms = [];
        for (var i = 0; i < n; i++) {
            terms.push({_id: i})
        }
        db.c.findOne({ _id:{$in: terms}, a: 1, b: 2})
    }
 
    for (let i = 0; i < iters; i++) {
        n += step;
        setup();
        let startTime = new Date();
        remove();
        let endTime = new Date();
        let elapsedTime = endTime - startTime;
        print(n + "\t" + elapsedTime + "ms");
    }
}());

$or

(function() {
    "use strict";
 
    let n = 0;
    let step = 1000;
    let iters = 20;
 
    function setup() {
        db.c.drop();
        db.c.insertOne({_id: 0})
    }
 
    function remove() {
        let terms = [];
        for (var i = 0; i < n; i++) {
            terms.push({_id: i})
        }
        db.c.findOne({$or: terms})
    }
 
    for (let i = 0; i < iters; i++) {
        n += step;
        setup();
        let startTime = new Date();
        remove();
        let endTime = new Date();
        let elapsedTime = endTime - startTime;
        print(n + "\t" + elapsedTime + "ms");
    }
}());

Result:

$or

1000    52ms
2000    165ms
3000    305ms
4000    473ms
5000    685ms
6000    953ms
7000    1299ms
8000    1680ms
9000    2076ms
10000   2577ms
11000   3166ms
12000   3708ms
13000   4366ms
14000   5081ms
15000   6902ms
16000   7832ms
17000   8033ms
18000   9338ms
19000   10082ms
20000   11684ms

$in

1000    45ms
2000    60ms
3000    70ms
4000    62ms
5000    90ms
6000    85ms
7000    88ms
8000    115ms
9000    144ms
10000   133ms
11000   175ms
12000   171ms
13000   169ms
14000   180ms
15000   216ms
16000   198ms
17000   216ms
18000   212ms
19000   257ms
20000   236ms

Conclusion:

  • A compound index scan is faster than multiple individual single field index scans. Your original $or example will also need a compound index, and it will likely be even slower than the test above.
  • $in is faster than $or in most cases.
4 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.