I have recently modified a complex multi-field query to use $and instead of simply assigning the fields as direct properties of the query. The idea being to sequence things so it fails fast, instead of applying every condition, even if we know it should not be, if the previous condition is already false.
For example, if we have a collection of people with a schema looking as follows:
Name
Address
Street
City
CountryCode
Favourite Colour
Then a condition such as:
people.address.country: ‘AU’
people.address.city: ‘Melbourne’
people.name: ‘john’
My understanding, based on documentation, is that without the $and all the conditions will be evaluated, even if the previous conditions have already evaluated to false.
I am now being challenged to show that there is a performance improvement in doing this, so I have started writing a test case for this, but can anyone confirm my understanding is correct and whether there is any existing documentation or test cases showing the performance gain?
I think that the difference between the $and and just comma separated values os just syntactic. Don’t believe there is any difference from execution planning.
When the document is inspected for filtering the entire document is read therefore failing fast is really negligible…
Anyway, to improve the described query you should build compound indexes for all equility fields this is what will definitely improve performance.
In general, you should index fields based on the following rule: Equity, Sort , Range (ESR).
I ended up writing the test scenario and did find indexed fields coupled with and $and as part of the query did speed things up. Sometimes we were talking a 20ms improvement, going from 22ms to 2ms. For a site with a lot of data and traffic this can add up. Also, since it is such as simple optimisation I’ll take it.
The test scenario I wrote:
Feel free to comment on it. I have made it public so it can be improved on.
I didn’t mean that the short circuit does not exist but when occuring on the same object its ngelegible as inspecting two additional comparison in a document we have to read is very fast. To speed this up we need to use index compound on all fields to not read the document at all if not all conditions are met.
You have an index only seperately on countryCode or on town instead of creating
Both queries shows identical parsed query explain output, so they’re the same query. In fact, the comma separated query was internally translated to use $and. I’m not sure why your test shows a difference between the two cases, but they should not differ.
One way to check is currently in the test you posted, you use the comma-separated case first, then the $and case second. Try reversing their order to eliminate the possibility of cache warm up interfering with the timing.
One reason that might explain this is the cache behavior. Let assume you start with a cold server with the working set out of RAM. If you perform the implicit and first, you spend a couple of extra CPU cycle to bring everything in RAM. By running the explicit $and after on the now hot server with the with the working set in RAM you will definitively obtain better numbers.
Not enough information was supplied to be know if this was the case but I think it is a possibility.