Hey @V_N_A,
Just adding a few more points to what @steevej has rightly said.
Firstly, I assume that when you wrote the first query ie.
{
"status": { "$ne": 4 },
"queue": true,
"department": { "$in": ["banana"] }
},
{ "sort": { "ts": -1 }, "limit": 20}
you meant this:
find({"status": {"$ne": 4}, "queue": true, "department": {"$in": ["banana"]}}).sort({ts:-1}).limit(20)
because the first one is just projecting the documents.
Coming back to the post, a general rule of thumb one considers when dealing with compound indexes is that you don’t need an additional index if your query can be covered by the prefix of the existent compound index. Another thing to note is that Inequality operators
such as $ne
or $nin
are range operators, not equality operators, so moving status to the end might help speed up your first query. But in the case of your second query, since there is no department
, you will end up with memory sort since prefixes won’t work.
About the SORT
stage: although having a SORT
stage is not necessarily a bad thing if the result set is reasonably small, the best outcome is to avoid having this stage altogether, since it can cause the query to fail if it runs out of memory to do the sorting. Using proper indexing can avoid the SORT
stage.
To demonstrate out all this, I tried the same with the sample document you provided and created a sample collection with 1000 such documents using mgeneratejs, a tool to create example documents by following some patterns you define.
If we create an index like the one you suggested here:
I named the index status_queue_department_ts
and so on the first query ie.
{"status": {"$ne": 4}, "queue": true, "department": {"$in": ["banana"]}}).sort({ts:-1}
the Explain Plan gives the following results:
expRun = db.collections.explain("executionStats")
expRun.find({"status": {"$ne": 4}, "queue": true, "department": {"$in": ["banana"]}}).sort({ts:-1})
...
executionStats: {
executionSuccess: true,
nReturned: 152,
executionTimeMillis: 0,
totalKeysExamined: 169,
totalDocsExamined: 152,
executionStages: {
stage: 'FETCH',
...
inputStage: {
stage: 'SORT',
...
inputStage: {
stage: 'IXSCAN',
...
From the execution stats, it shows it examined a total of 169 keys and 152 documents and returned 152 documents, which is not a bad query targeting ratio. Note that the winning plan contains a SORT
stage.
Now for the second query ie. without the department
field, the same .explain
, gives the following results:
expRun.find({"status": {"$ne": 4}, "queue": true}).sort({ts:-1})
...
executionStats: {
executionSuccess: true,
nReturned: 477,
executionTimeMillis: 2,
totalKeysExamined: 486,
totalDocsExamined: 477,
executionStages: {
stage: 'FETCH',
...
inputStage: {
stage: 'SORT',
...
inputStage: {
stage: 'IXSCAN',
...
we can see now, that it has returned 477 documents and had to scan 486 keys. This is also not a bad query targeting ratio, but this also contains a SORT
stage.
Thus for your first index, in conclusion, it’s not the ideal index for the query since the explain output for both queries contains a SORT
stage, even though the query targeting metric is not too bad.
For your second index ie. status_queue_ts
, the first query returned:
...
executionStats: {
executionSuccess: true,
nReturned: 152,
executionTimeMillis: 1,
totalKeysExamined: 486,
totalDocsExamined: 477,
executionStages: {
stage: 'SORT',
...
inputStage: {
stage: 'FETCH',
...
inputStage: {
stage: 'IXSCAN',
...
ie. 486 keys scanned for only 152 returned documents. This is not great, since it means the server needs to examine ~3 documents for each relevant one, thus the server does a lot of unnecessary work. It would be the same if no department is mentioned as is the case of your second query. Note that this also contains a SORT
stage, which makes it even more unappealing.
Now, we if change our index order to
{
"queue":1,
"department":1,
"ts":1,
"status":1
}
naming it as queue_department_ts_status
, for the first query, .explain
returns:
...
executionStats: {
executionSuccess: true,
nReturned: 152,
executionTimeMillis: 0,
totalKeysExamined: 168,
totalDocsExamined: 152,
executionStages: {
stage: 'FETCH',
...
inputStage: {
stage: 'IXSCAN',
...
We can see it scanned 168 keys and returned 152 documents (not a bad query targeting metric), all without using a SORT
stage.
For the second query, however, since there is no department, it relies on memory sort:
...
executionStats: {
executionSuccess: true,
nReturned: 477,
executionTimeMillis: 2,
totalKeysExamined: 526,
totalDocsExamined: 477,
executionStages: {
stage: 'FETCH',
...
inputStage: {
stage: 'SORT',
...
inputStage: {
stage: 'IXSCAN',
...
The number of keys scanned increases to 526 with the same 477 documents returned, but all 477 examined documents are returned, so the server does a little unnecessary work. However the presence of the SORT
stage negates this.
As we can see, the selection of the order of indexes would highly depend on what query you are going to use the most as well as the operators you will use in them. Eg. When $in
is used alone, it is an equality operator that does a series of equality matches, but acts like a range operator when it is used with .sort()
. Additionally, the use case you posted might require two different indexes if you want to avoid having a SORT
stage. One with department
, and the other without.
I would highly suggest you read the following to cement your knowledge about indexes.
Please let us know if there’s any confusion in this. Feel free to reach out for anything else as well.
Regards,
Satyam Gupta