I know the docs say that “$in” will be better here but compound index scan vs individual index scans might tip the scales in favor of “or”
currently the “acceptable_values_for_field1” have only one element in it, and will probably be the case for a long long time. Do you think “$in” is still better?
With a small amount of $or or $in items, you may not notice a difference in performance between the two queries
With a large mount of items in $or or $in (say 10000), and if an index with keys { field3: 1, field1: 1, field2: 1} exists for the $in query, then the $in query will likely outperform $or.
$or will use as many IXSCAN plans (you can check by using .explain()) as there are $or conditions; while the $in query will only have 1 IXSCAN plan.
Compound index scan will likely outperform multiple individual index scans.
You’ll likely not notice a difference here with only one element in “acceptable_values_for_field1”. But $in should be considered over $or when possible.
I am doing this in production. As I scale up for more values in acceptable_values_for_field1 will short circuits be better used for for getting faster return on the search? Remember I am just looking for one document back as soon as possible for processing it. The way I have written the $or query could potentially use that. I get that with just one value it wont make much difference but the number of times I am doing this is very high in production so the minor bumps will also addup a lot for me.
You can use the following code in mongosh to compare the performance difference in a test environment.
Feel free to tweak the variable step for the number of items in $in or $or
$in
(function() {
"use strict";
let n = 0;
let step = 1000;
let iters = 20;
function setup() {
db.c.drop();
db.c.insertOne({_id: 0, a: 1, b: 2})
db.c.createIndex({a: 1, b: 1, _id: 1})
}
function remove() {
let terms = [];
for (var i = 0; i < n; i++) {
terms.push({_id: i})
}
db.c.findOne({ _id:{$in: terms}, a: 1, b: 2})
}
for (let i = 0; i < iters; i++) {
n += step;
setup();
let startTime = new Date();
remove();
let endTime = new Date();
let elapsedTime = endTime - startTime;
print(n + "\t" + elapsedTime + "ms");
}
}());
$or
(function() {
"use strict";
let n = 0;
let step = 1000;
let iters = 20;
function setup() {
db.c.drop();
db.c.insertOne({_id: 0})
}
function remove() {
let terms = [];
for (var i = 0; i < n; i++) {
terms.push({_id: i})
}
db.c.findOne({$or: terms})
}
for (let i = 0; i < iters; i++) {
n += step;
setup();
let startTime = new Date();
remove();
let endTime = new Date();
let elapsedTime = endTime - startTime;
print(n + "\t" + elapsedTime + "ms");
}
}());
A compound index scan is faster than multiple individual single field index scans. Your original $or example will also need a compound index, and it will likely be even slower than the test above.