Group aggregations on chunks of a list

I have to perform aggregation on a very large data.

The flow is:-

[
{
"$match": {"field": { "$in" : [a list of around 1k+ elements(one_k_list)] }}
},
{"$group", .....},
{"$group", .....}
]

This is taking a lot of time, but if I divide the list into small chunks of let’s say 50 elements then i can make around 20+ async db calls and get my result. My question is will combining those 20+ db call result give me same output as the one with 1K+ list.

Hi @Saujanya_Tiwari welcome to the community!

Apologize for the delay, but have you found a solution for your use case yet?

I tend to think that the double $group is the main issue here. In an aggregation, certain stages like $group or $sort without an index is termed a “blocking” stage. That is, this stage will need all the documents to be present to be able to do its work, and not operate on a document-per-document basis. In other words, it “blocks” the whole pipeline.

will combining those 20+ db call result give me same output as the one with 1K+ list.

That depends on what the $group operation does. If it’s mathematically equal (e.g. if you’re adding or multiplying numbers), then yes. If it’s not equal (e.g. some statistical functions concerning populations), then you’ll need to do more things to make them mathematically equal

Best regards
Kevin

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.