Query optimizer

When does the query optimizer rearrange the predicate and when it doesn’t?

I always get confused, and how to know for sure what indexes cover what query and sort it?

Hi @Abdullah_Alfadhel

We don’t explicit cover that level of detail in either the course or indeed in our documentation, however you can see examples of what is done for Aggregation pipelines in this docs page - https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/

In terms of outlining how best to cover a query this docs page https://docs.mongodb.com/manual/core/query-optimization/#covered-query has the details on what considerations you will need to take into account. This docs page has https://docs.mongodb.com/manual/tutorial/sort-results-with-indexes/

In order to cover and sort, all the fields in the query should be part of the index as should the field to sort on. The easiest way is to use the prefix of the index to sort on, however, you can use non-prefix fields with equality conditions but I’d suggest it’s easier to design with just the prefix if possible in a scenario.

Kindest regards,
Eoin

Thank you for your time @Eoin_Brazil .

But the docs doesn’t explain how does the server see this query with this index:

db.collection.createIndex(“a”:1,“b”:1,“c”:1)

db.collection.find(“a”:10,“b”:“joe”,“c”:2019)

the same as this query:

db.collection.find(“b”:“joe”,“a”:10,“c”:2019)

the server rearrange the predicate for some reason and I don’t know how to think like it.

Hi @Abdullah_Alfadhel

Ideally, for a compound index’s first few fields, they should be chosen to have a low cardinality. This will then allow you to avail of a more effective prefix compression which means your index will be smaller. In the case of a low cardinality field such as ‘b’ or user’s first name, having that first will limit the required piece of the index to a subset of the index which can stay resident in the cache. The result of only having that smaller subset in cache means the majority of the rest of the index isn’t in the cache (it is cold / on disk) and space that might have been used by it, can instead be used for data/documents.

In terms of working this for specific scenarios beyond the advice above, I’d suggest posting a more concrete example in the Working with Data forum or looking to engage our Professional Services team if this relates to a production system.

Kindest regards,
Eoin

1 Like

It rearranges the predicate because it can because the logical expression A AND B is logically equivalent to B AND A. So if you specify A AND B and an index A:1,B:1 exists it will be used. If none exists it will check if an index B:1,A:1 exists and will use it. It does not necessarily do that in this order because depending on many factors it might not be optimal. For example, if both index B,A and A,B exists, it is more efficient to use the one that is already in RAM, so a query on B,A might use the index A,B because it ‘more’ available. The exact details are not important.

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.