Unwind and Sort - Decrease Query Time

Andre_Luiz · May 24, 2022, 7:34pm

We are facing an issue in our query where using “sort” significantly increases execution time.

The query is this:

db.collections.aggregate([
    {$match: {$and: [{"products.productInfo.productId": {"$gte": 93143, "$lte": 93643}}]}},
    {$unwind: "$products"},
    {$match: {$and: [{"products.productInfo.productId": {"$gte": 93143, "$lte": 93643}}]}},  
    {$sort: {"products.productInfo.productId": 1}},    
    {$limit: 50}
])

We use unwind because our document has a list of products and filters from a range of ids. The productId is an index.

ExecutionStats without the “sort” clause (average time below 1s):
“nReturned” : 163
“executionTimeMillis” : 16
“totalKeysExamined” : 351
“totalDocsExamined” : 297

ExecutionStats with the “sort” clause (average time 14s):
“nReturned” : 1919
“executionTimeMillis” : 17504
“totalKeysExamined” : 847869
“totalDocsExamined” : 451385

I read this in a post:

This happens because of $unwind stage.Your query performance is slow because query is not considering index after the $unwind stage.Check that with explain.you will get to know.This happens because after the $unwind whole documents will change and it becomes different that is stored in RAM for indexing purpose.

Is that why sort degrades the query?

MaBeuLux88_xxx · May 25, 2022, 4:28pm

Long story short, yes.

This doc explains when an aggregation pipeline can use the collection indexes.

Basically, $match and $sort at the beginning of a pipeline can use an index from the collection. Under certain conditions, $group can also use an index. Same for $geoNear but that’s it.

After a $unwind, the docs are completely different from the ones in the collection so indexes are useless after that stage.

I think the only thing you can remove in your query to “improve” is to remove the $and that only contain a single filter. But this won’t affect the speed much.

As the $sort happens in memory, more RAM (if the RAM is constantly full) or faster RAM (DDR5 For the Win) could help.

Another potential improvement would be to use $filter instead of the $match + $unwind + $match combo. This would help reduce the size of the pipeline in RAM and this would also avoid the second $match that’s also not backed up by an index anymore.

Cheers,
Maxime.

system · May 30, 2022, 4:29pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.