Hey there, we’re using M60 cluster ( 8.0.11 ), and we have one query that needs to run on all orders from a specific “tenant” ( you basically have multiple orders per tenant ).
Now the collection size is 50M documents, and the aggregation that I’m performing is:
now most of the query execution time is the second pipeline stage, the $group.
I know that $group doesn’t use indexes ( even though uuid: 1, status: 1 index exists ).
Adding $sort before that will not help as I don’t use any accumulators (like $first / $last ) inside the $group.
The avg execution time for this query is about 10-20 min ( depending on which tenant id we matching at first, if one tenant has alot of orders that it will take more time ).
I would love to get some opinions here of how I can improve this result?
Does MongoDB is even being used for such operations?
Hi @yoni_egorov, so I took some time to understand what’s going on here, I did some research and yes you are correct, the $group operator does not use indexes. However, I think there’s a better approach to this.
In the $group stage, I can see that you are still filtering, which is slowing things down, I think you should move this extra condition to the first $match stage so this makes the query faster.
Plus, because the $match stage is basically a .find(), your indexes will work in the $match stage.
So the aggregation pipeline should look like this -
Lastly, you subtract the first aggregation results from the collection.distinct() result to give you that difference you are looking for, I hope this helps.