Indexing with the rule of equality/sort/range

Hi,

Regarding the indexing rule of equality/sort/range, I am using this tutorial as knowledge base: Chapter_4_CRUD_Optimization

I need help understanding why the rule of equality/sort/range is not applying to my example (which is a simplified reproducer of my real issue):

// ----- INIT DATA -----
use test;
db.test.drop();
db.test.insertMany([ // 10 items
    { a: 1, b: 101, c:0, d:1 },
    { a: 1, b: 102, c:0, d:1 },
    { a: 1, b: 103, c:0, d:1 },
    { a: 1, b: 104, c:0, d:1 },
    { a: 1, b: 105, c:0, d:1 },
    { a: 2, b: 106, c:0, d:1 },
    { a: 2, b: 107, c:0, d:1 },
    { a: 2, b: 108, c:0, d:1 },
    { a: 2, b: 109, c:0, d:1 },
    { a: 2, b: 110, c:0, d:1 },
]);
// ----- INIT INDEX -----
db.test.dropIndexes();
db.test.createIndex({b : 1, a : 1}); // index b (sort) + a (range)
// ----- QUERY -----
db.test.find({a:{$lt:2}}, {a:1, _id:0}).sort({b:1}).explain("executionStats");

I am trying to use the compound index to sort data by “b” using index key “b : 1” and then to filter data by “a < 2” using index key “a : 1”.
==> It doesn’t work. It sorts data using the index but it doesn’t filter using the index. It filters using FETCH with the filter “a < 2”.

{
        "explainVersion" : "1",
        "queryPlanner" : {
                "namespace" : "test.test",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "a" : {
                                "$lt" : 2
                        }
                },
                "maxIndexedOrSolutionsReached" : false,
                "maxIndexedAndSolutionsReached" : false,
                "maxScansToExplodeReached" : false,
                "winningPlan" : {
                        "stage" : "PROJECTION_SIMPLE",
                        "transformBy" : {
                                "a" : 1,
                                "_id" : 0
                        },
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "a" : {
                                                "$lt" : 2
                                        }
                                },
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                                "b" : 1,
                                                "a" : 1
                                        },
                                        "indexName" : "b_1_a_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                                "b" : [ ],
                                                "a" : [ ]
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "b" : [
                                                        "[MinKey, MaxKey]"
                                                ],
                                                "a" : [
                                                        "[MinKey, MaxKey]"
                                                ]
                                        }
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 5,
                "executionTimeMillis" : 1,
                "totalKeysExamined" : 10,
                "totalDocsExamined" : 10,
                "executionStages" : {
                        "stage" : "PROJECTION_SIMPLE",
                        "nReturned" : 5,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 11,
                        "advanced" : 5,
                        "needTime" : 5,
                        "needYield" : 0,
                        "saveState" : 0,
                        "restoreState" : 0,
                        "isEOF" : 1,
                        "transformBy" : {
                                "a" : 1,
                                "_id" : 0
                        },
                        "inputStage" : {
                                "stage" : "FETCH",
                                "filter" : {
                                        "a" : {
                                                "$lt" : 2
                                        }
                                },
                                "nReturned" : 5,
                                "executionTimeMillisEstimate" : 0,
                                "works" : 11,
                                "advanced" : 5,
                                "needTime" : 5,
                                "needYield" : 0,
                                "saveState" : 0,
                                "restoreState" : 0,
                                "isEOF" : 1,
                                "docsExamined" : 10,
                                "alreadyHasObj" : 0,
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "nReturned" : 10,
                                        "executionTimeMillisEstimate" : 0,
                                        "works" : 11,
                                        "advanced" : 10,
                                        "needTime" : 0,
                                        "needYield" : 0,
                                        "saveState" : 0,
                                        "restoreState" : 0,
                                        "isEOF" : 1,
                                        "keyPattern" : {
                                                "b" : 1,
                                                "a" : 1
                                        },
                                        "indexName" : "b_1_a_1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                                "b" : [ ],
                                                "a" : [ ]
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                                "b" : [
                                                        "[MinKey, MaxKey]"
                                                ],
                                                "a" : [
                                                        "[MinKey, MaxKey]"
                                                ]
                                        },
                                        "keysExamined" : 10,
                                        "seeks" : 1,
                                        "dupsTested" : 0,
                                        "dupsDropped" : 0
                                }
                        }
                }
        },
        "command" : {
                "find" : "test",
                "filter" : {
                        "a" : {
                                "$lt" : 2
                        }
                },
                "sort" : {
                        "b" : 1
                },
                "projection" : {
                        "a" : 1,
                        "_id" : 0
                },
                "$db" : "test"
        },
        "serverInfo" : {
                "host" : "LAPTOP-8552N07T",
                "port" : 27017,
                "version" : "5.0.1",
                "gitVersion" : "318fd9cabc59dc9651f3189b622af6e06ab6cd33"
        },
        "serverParameters" : {
                "internalQueryFacetBufferSizeBytes" : 104857600,
                "internalQueryFacetMaxOutputDocSizeBytes" : 104857600,
                "internalLookupStageIntermediateDocumentMaxSizeBytes" : 104857600,
                "internalDocumentSourceGroupMaxMemoryBytes" : 104857600,
                "internalQueryMaxBlockingSortMemoryUsageBytes" : 104857600,
                "internalQueryProhibitBlockingMergeOnMongoS" : 0,
                "internalQueryMaxAddToSetBytes" : 104857600,
                "internalDocumentSourceSetWindowFieldsMaxMemoryBytes" : 104857600
        },
        "ok" : 1
}

Hi @Joel_J

I believe the Equality → Sort → Range rule of thumb is applicable when you have all three in a single query.

However if your query is only Sort → Range, then the index can’t be fully used, as you have observed.

The way I understand it, the term {a:{$lt:2}} is not very selective. It can easily contain >90% of the documents in the collection, 100%, or none. An index is best used to eliminate as many non-matching documents as possible. MongoDB also tries to use an index to eliminate any in-memory SORT stage.

In your example case, the “best case” is to have the query to be as selective as possible, like:

db.test.find({a:2}, {a:1, _id:0}).sort({b:1})

or for multiple values of a:

db.test.find({a:{$in:[2,3,4]}}, {a:1, _id:0}).sort({b:1})

While the best index for both queries is

db.test.createIndex({a:1, b:1})

which is Equality → Sort.

For an extensive (albeit a bit outdated) explanation on this, please see Optimizing MongoDB Compound Indexes

Best regards
Kevin

Thanks for you answer.

This is just a reproducer with dummy values and I have many filters, dynamically generated in a search formular, applying to values which are randomly seizured by front-end users. There is no use-case where I can have a list of values to use $in.

I understand you’re telling me that I can’t use the index for a range filter after using it for a sort. It is indeed what i have noticed in my test.

But then my question is: Why does the rule of equality/sort/range have the “range” in it if it can’t be used? Why creating the keys in the index for range filter if it can’t be used by mongodb engine. Because in the tutorial they add these keys for range filtering after the sort stage. Do they do it in vain? Is it a mistake in the tutorial? There is something I’m still missing to understand here, and I need to understand it to take an appropriate decision in my app design.

Hi @Joel_J

Unfortunately I can’t access the link you posted since I’m not registered for the course.

However, the basic idea of Equality → Sort → Range is to have the order of your fields to be optimal when creating an index to support your query.

So the priority is:

  • Equality only
  • Equality → Sort
  • Equality → Sort → Range

The example you posted is just Sort → Range without the Equality part.

Note that depending on the documents in your collection, having a FETCH stage is not necessarily a bad thing, as long as your query targeting is good (the ratio of scanned documents vs. returned documents). You want this ratio as close to 1:1 as possible. Something like 100:1 would be terrible since the server has to examine 100 documents to return just 1, so a lot of work was for nothing.

A possible alternative method is to use Partial Index to index only certain documents fulfilling some criteria. For example, you might want to only index documents having a < 2, then combine it with hint() to force MongoDB to use that particular index. Since the a < 2 criteria is already described by the partial index, your query then could be:

db.test.createIndex({b:1, a:1}, {partialFilterExpression:{a:{$lt:2}}})
db.test.find({},{a:1, _id:0}).sort({b:1}).hint('b_1_a_1')

*Note: Please use hint() in combination with a partial index very carefully, since it’s possible for MongoDB to return an incomplete result set in this situation.

Best regards
Kevin

This is the direct link to the video.

Thanks but what you explain me, I already know, I’m already using it in other parts of my app, and yes it woks well, but it’s not what I’m asking.

What I need to do is sort, then range, using the same index. It doesn’t change anything if I put 0 or more equality filters before the sorting.

And I believe it’s possible since it’s done in the tutorial. Unless the tutorial is false and then I could use this information. However if the tutorial is correct, and if an index can be used to sort and range-filter, if there are conditions to enable it, I could change my schema and adapt my code to whatever conditions are necessary.

Regards.

Hi @Joel_J

Thanks for the video. I don’t think the video is incorrect, as they always talk about having Equality, Sort, Range parts in the query, and not Sort Range without the Equality part.

The best outcome I can get is this:

db.test.createIndex({c:1, b:1, a:1})
db.test.find({a:{$lt:2}, c:0}, {a:1, _id:0}).sort({b:1})

By putting the equality c:0, the explain output shows only IXSCAN and PROJECTION_COVERED, but without the c:0 equality, I always have the FETCH stage.

Best regards
Kevin

Interesting lead. For some reason, it does seem to make a difference to add a trivial equality filter. However it’s not using the range index on a and it’s only scanning the whole data contained in the index doc by doc.

"executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 5,
        "executionTimeMillis" : 0,
        "totalKeysExamined" : 10,
        "totalDocsExamined" : 0,
        "executionStages" : {
                "stage" : "PROJECTION_COVERED",
                "nReturned" : 5,
                "executionTimeMillisEstimate" : 0,
                "works" : 11,
                "advanced" : 5,
                "needTime" : 5,
                "needYield" : 0,
                "saveState" : 0,
                "restoreState" : 0,
                "isEOF" : 1,
                "transformBy" : {
                        "a" : 1,
                        "_id" : 0
                },
                "inputStage" : {
                        "stage" : "IXSCAN",
                        "nReturned" : 5,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 11,
                        "advanced" : 5,
                        "needTime" : 5,
                        "needYield" : 0,
                        "saveState" : 0,
                        "restoreState" : 0,
                        "isEOF" : 1,
                        "keyPattern" : {
                                "c" : 1,
                                "b" : 1,
                                "a" : 1
                        },
                        "indexName" : "c_1_b_1_a_1",
                        "isMultiKey" : false,
                        "multiKeyPaths" : {
                                "c" : [ ],
                                "b" : [ ],
                                "a" : [ ]
                        },
                        "isUnique" : false,
                        "isSparse" : false,
                        "isPartial" : false,
                        "indexVersion" : 2,
                        "direction" : "forward",
                        "indexBounds" : {
                                "c" : [
                                        "[0.0, 0.0]"
                                ],
                                "b" : [
                                        "[MinKey, MaxKey]"
                                ],
                                "a" : [
                                        "[-inf.0, 2.0)"
                                ]
                        },
                        "keysExamined" : 10,
                        "seeks" : 6,
                        "dupsTested" : 0,
                        "dupsDropped" : 0
                }
        }
},

Also, I tried with your solution with 1.000.000 entries and it goes back to fetching again, and the query plan is totally different. The query optimizer seems quite unpredictable…

// ----- INIT DATA -----
use test;
db.test.drop();
for (let i = 0; i < 100000; i++) {
    db.test.insertMany([ // 10 items
        { a: 1, b: 101, c:0, d:1 },
        { a: 1, b: 102, c:0, d:1 },
        { a: 1, b: 103, c:0, d:1 },
        { a: 1, b: 104, c:0, d:1 },
        { a: 1, b: 105, c:0, d:1 },
        { a: 2, b: 106, c:0, d:1 },
        { a: 2, b: 107, c:0, d:1 },
        { a: 2, b: 108, c:0, d:1 },
        { a: 2, b: 109, c:0, d:1 },
        { a: 2, b: 110, c:0, d:1 },
    ]);
}
// ----- INIT INDEX -----
db.test.dropIndexes();
db.test.createIndex({c:1, b:1, a:1}); // equality/sort/range
// ----- QUERY -----
db.test.find({a:{$lt:2}, c:0}, {a:1, _id:1}).sort({b:1}).explain("executionStats");

Gives:

"executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 500000,
        "executionTimeMillis" : 721,
        "totalKeysExamined" : 500005,
        "totalDocsExamined" : 500000,
        "executionStages" : {
                "stage" : "PROJECTION_SIMPLE",
                "nReturned" : 500000,
                "executionTimeMillisEstimate" : 51,
                "works" : 500006,
                "advanced" : 500000,
                "needTime" : 5,
                "needYield" : 0,
                "saveState" : 500,
                "restoreState" : 500,
                "isEOF" : 1,
                "transformBy" : {
                        "a" : 1,
                        "_id" : 1
                },
                "inputStage" : {
                        "stage" : "FETCH",
                        "nReturned" : 500000,
                        "executionTimeMillisEstimate" : 42,
                        "works" : 500006,
                        "advanced" : 500000,
                        "needTime" : 5,
                        "needYield" : 0,
                        "saveState" : 500,
                        "restoreState" : 500,
                        "isEOF" : 1,
                        "docsExamined" : 500000,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 500000,
                                "executionTimeMillisEstimate" : 25,
                                "works" : 500006,
                                "advanced" : 500000,
                                "needTime" : 5,
                                "needYield" : 0,
                                "saveState" : 500,
                                "restoreState" : 500,
                                "isEOF" : 1,
                                "keyPattern" : {
                                        "c" : 1,
                                        "b" : 1,
                                        "a" : 1
                                },
                                "indexName" : "c_1_b_1_a_1",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "c" : [ ],
                                        "b" : [ ],
                                        "a" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "c" : [
                                                "[0.0, 0.0]"
                                        ],
                                        "b" : [
                                                "[MinKey, MaxKey]"
                                        ],
                                        "a" : [
                                                "[-inf.0, 2.0)"
                                        ]
                                },
                                "keysExamined" : 500005,
                                "seeks" : 6,
                                "dupsTested" : 0,
                                "dupsDropped" : 0
                        }
                }
        }
},

Hi @Joel_J,

Just chiming in here with a few items in addition to the information @kevinadi shared.

The general observation that a FETCH was happening for the trailing range filter on field a despite its presence in the index is tracked in SERVER-13197. We would encourage you to vote for the ticket if that is something which you would like to see implemented.

A workaround would be to add a “no-op” filter against the sort field in the query predicate itself. It should be a “no-op” with regards to not logically changing the results. So in the sample query originally posted here, it could be something like b: { $gte: MinKey() }. This should allow the filter to move out of the FETCH stage and be satisfied directly via the tighter index bounds.

With respects to the latest comment regarding the query planner and fetching behavior, there are a couple of other things worth noting. Plan selection is currently influenced by data in the associated collection. Plan generation, however, mostly does not. The change in the structure of the plan is usually indicative of a change in the query shape itself. Indeed, we can see that there is a difference in the transformBy field (the projection) in the two outputs:

"transformBy" : {
    "a" : 1, 
    "_id" : 0
}

versus

"transformBy" : {
    "a" : 1,
     "_id" : 1
}

The latter query requested the _id field in the output. Since that field is not in the index the database was forced to FETCH the document to retrieve it. This comes from the sample code which requested a projection of {a:1, _id:1}. Turning off the _id field should allow for the operation to be covered once again regardless of how much data is in the collection.

Hope this helps!

Best,
Chris

1 Like