Unwind and Sort - Decrease Query Time

We are facing an issue in our query where using “sort” significantly increases execution time.

The query is this:

db.collections.aggregate([
    {$match: {$and: [{"products.productInfo.productId": {"$gte": 93143, "$lte": 93643}}]}},
    {$unwind: "$products"},
    {$match: {$and: [{"products.productInfo.productId": {"$gte": 93143, "$lte": 93643}}]}},  
    {$sort: {"products.productInfo.productId": 1}},    
    {$limit: 50}
])

We use unwind because our document has a list of products and filters from a range of ids. The productId is an index.

ExecutionStats without the “sort” clause (average time below 1s):
“nReturned” : 163
“executionTimeMillis” : 16
“totalKeysExamined” : 351
“totalDocsExamined” : 297

ExecutionStats with the “sort” clause (average time 14s):
“nReturned” : 1919
“executionTimeMillis” : 17504
“totalKeysExamined” : 847869
“totalDocsExamined” : 451385

I read this in a post:

This happens because of $unwind stage.Your query performance is slow because query is not considering index after the $unwind stage.Check that with explain.you will get to know.This happens because after the $unwind whole documents will change and it becomes different that is stored in RAM for indexing purpose.

Is that why sort degrades the query?

Long story short, yes.

This doc explains when an aggregation pipeline can use the collection indexes.

Basically, $match and $sort at the beginning of a pipeline can use an index from the collection. Under certain conditions, $group can also use an index. Same for $geoNear but that’s it.

After a $unwind, the docs are completely different from the ones in the collection so indexes are useless after that stage.

I think the only thing you can remove in your query to “improve” is to remove the $and that only contain a single filter. But this won’t affect the speed much.

As the $sort happens in memory, more RAM (if the RAM is constantly full) or faster RAM (DDR5 For the Win) could help.

Another potential improvement would be to use $filter instead of the $match + $unwind + $match combo. This would help reduce the size of the pipeline in RAM and this would also avoid the second $match that’s also not backed up by an index anymore.

Cheers,
Maxime.

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.