I have a aggregation which is fairly straightforward which is $match filedA, with #GTE 0 for fieldB. Not that fieldB cannot be negative the $sort on FiledB.
The indexes are AB and B. Field A has high cardinality (almost unique) and there are about 1 million documents in the collection. For some reason Mongo chooses the index on fieldB, and does a complete index scan with a fetch to each document to check the value of fieldA.
Any ideas why this may be. It does work find after removing the index on fieldB and the performance is literally thousands of times better.
Is Mongo aware of stats such as volumes, cardinality etc as would be the case (for example) in DB2.
You don’t give us very precise information to understand where it may come from. Can you share your aggregation pipeline with us ? And also all of your indexes please.
A temporary solution, maybe we won’t look any further, you can use hint to force your query to use a specific index.
It’s not flaming but it works
Hi Gaetan,
Thanks very much for coming back to me on this and I hope that my reply to you is not too verbose
At the bottom is the log text for the COMMAND in question. While the issue has now been solved (see below), we went through some stages:
There were no indexes (except _id) on the collection. When run, the command (obviously) caused a COLLSCAN which took around 1,100 ms.
The user then added two indexes, one on rootOperation.path and one on timestamp. This is what resulted in the log entry below. Mongo obviously chose the index on timestamp (probably due to the sort) which caused a whole index scan followed by a fetch for each document to check the value of rootOperation.path.
We then added timestamp to the rootOperation.path index, however this made no difference and Mongo continued to choose the timestamp index.
Finally, we deleted the timestamp index (so the only index remaining was rootOperation.path_timestamp and) this fixed the issue to the extent that the COMMAND is not reported upon (less than 100 ms).
In my opinion, given that the cardinality of rootOperation.path is very high (in this case nreturned 0), as soon as there is an index with it as a high level selection, then it should be chosen. What I have also learned (hopefully always learning ) is that Mongo does not use statistics in the same manner as DB2 etc.
I take your point about hints, however I can only recommend there and to be honest, I am a bit of a DBMS purest when it comes to hints so I do not use them on Oracle for example
In any case, any ideas highly appreciated.
An index can support sort operations on a non-prefix subset of the index key pattern. To do so, the query must include equality conditions on all the prefix keys that precede the sort keys.
It does work find after removing the index on fieldB…
I am not sure about this.
But, I just tried the aggregation query on a small dataset with similar data and indexes (both the indexes). The query optimizer always used the index on both fields a+b, and used the index on both the match and sort operations.
I once encountered a similar issue, albeit on 3.4. I thought this should have evaluated to using a different index but it had not. Using db.collection.getPlanCache().clear() helped in that situation. One with more finesse would have been planCacheClear using the query shape.
It may not help in your scenario, but I thought I would mention it.
Thank you Prasad. Indeed there is an equals predicate (rootOperationPath) before the SORT field which is timestamp. Given the high cardinality of the former field though, I would have expected it to use the compound key even if it had to do a small sort after fetching.
As a matter of interest though, I had a similar situation today where we needed a new index. The only difference was hat there were three fields in the index and the search was with equality sorting on the third one. In this case it worked fine.