Hi @v_b - Welcome to the community.
According to the ESR rule, the index should probably be something like {filterId: -1, score: -1}, so that the $in
equality rule is satisfied first. But there would still be in memory sorts. Is there a better index available?
Note, I believe filterId
mentioned above should be Id
but correct me if I am wrong here.
In saying so, based off the sample documents and details you’ve provided I am curious if you have attempted to create an index definition of {"score": -1, "id": -1}
?
I compared both the following:
{ "id": -1, "score": -1}
{ "score": -1, "id": -1}
The query planner in my test environment which only contains the sample documents you’ve provided selects the index 2. { "score": -1, "id": -1}
to use for both of the queries you had advised.
In terms of the { "score": -1, "id": -1}
index, here are some details of the winningPlan
for the query:
db.collection. find({id: {$in: ["id 1","id 2","id 3","id 4","id 5"]}}).sort({score: -1, id: -1})
winningPlan: {
stage: 'FETCH',
filter: { id: { '$in': [ 'id 1', 'id 2', 'id 3', 'id 4', 'id 5' ] } },
inputStage: {
stage: 'IXSCAN',
keyPattern: { score: -1, id: -1 },
indexName: 'score_-1_id_-1',
isMultiKey: false,
multiKeyPaths: { score: [], id: [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { score: [ '[MaxKey, MinKey]' ], id: [ '[MaxKey, MinKey]' ] }
}
}
We can see that the index is used before a filter is done for the specified “id”
values.Additionally, here’s some further details regarding some of the testing with both indexes (from my test environment with only 5 of the sample documents you have provided):
-
find()
without $or
:
IXSCAN
→ FETCH
→ LIMIT
using {score:-1, id:-1}
index
IXSCAN
→ SORT
→ FETCH
using {id:-1, score:-1}
index
-
With $or
:
( IXSCAN
+ IXSCAN
) → SORT_MERGE
→ FETCH
→ LIMIT
all using {score:-1, id:-1}
IXSCAN
→ FETCH
(with conditions) → SORT
using {id:-1, score:-1}
Rejected plan: ( IXSCAN
+ IXSCAN
) → SORT
→ FETCH
using {id:-1, score:-1}
Note: {score:1,id:1}
would work in the same manner as {score:-1,id:-1}
(just in reverse) for this test.
Based off the 5 sample documents you’ve provided in my test environment, we can see in both instances that the index {id:-1, score:-1}
will have an in memory sort. This is due to the sort()
condition you have specified. Where as compared to the index {score:-1,id:-1}
, the documents are already sorted and just need to be fetched.
For your secondary query, the winning plan differs slightly as there is use of the $or
operator with some range conditions in which a ‘SORT_MERGE’
is performed. Although this index may work for your use case, there isn’t a single “silver bullet” index for all queries. You’ll need to consider it on a query-by-query basis. You could possibly consider {score:1, id:1}
and {id:-1, score:-1}
indexes for the query using $or
so that the query planner may decide which index to use for each branch of the $or
operator.
In saying the above, I believe your main concern is the in-memory sort (correct me if I am wrong here). The SORT
stage is also less desirable due to the 100MB limit of memory use (in which the operation will result in an error if it exceeds this amount and allowDiskUse() is not used).
It’s important to note that the ESR rule is used as a general rule of thumb and is a good starting place applicable to most use cases (i.e. not all). I would recommend going over the MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing slides which also has an example of an exception to the ESR rule. Additionally, perhaps the Sort of Multiple Fields documentation may be of use as well.
If you’re on Atlas using an M10+ tier cluster, you should have access to the performance advisor which may be able to help you optimize your indexes.
It is highly recommended you test this index first against a test environment with more data to verify it meets all your use case / requirements before creating this in production. You can verify the query execution in more detail using db.collection.explain("executionStats").find()
.
Regards,
Jason