I’ve looked around the forums at similar topics and still don’t fully understand how this works. I thought I grasped it and then Lab 2.2 threw me for a loop. After some investigation I think the piece I was missing is that RegExp() interacts with indexes differently than I had thought? Summary below:
At 2:40 of the “When you can sort with Indexes” lecture, Kirby says that “an index can be used to both filter and sort documents if the query includes equality conditions on all of the prefix keys that precede the sort keys.” For Lab 2.2, this led me to believe that the index { “address.state”: 1, “last_name”: 1, “job”: 1 } would have an in-memory sort for this query:
> db.people.find({
"job": /^P/,
"first_name": /^C/,
"address.state": "Indiana"
}).sort({ "last_name": 1 })
My thought was that, since there is no { “address.state”: 1, “job”: 1 } prefix, it would scan in-memory.
However, looking at the explain plan, it does not:
In contrast, an index that follows the order of the query { “address.state”: 1, “job”: 1, “last_name”: 1 } does do an in-memory sort:
If we run a query without RegExp() on the jobs field, however, we get our more efficient, no scan in memory with this second index:
So, what’s going on here? If the index is sorted by state, and then last name, and then job, doesn’t it need to filter the documents in that order first?
I’m trying to think of a good metaphor; if a book shelf is organized by color, author, and then title, and I take out all the blue books by Vonnegut, the titles should already be in order. But this doesn’t apply to “authors whose names start with V”, and it does apply to “books with titles that start with ‘The’”…?
Having a hard time wrapping my head around it. Any guidance would be appreciated.