Group aggregations on chunks of a list

Saujanya_Tiwari · June 30, 2022, 7:27am

I have to perform aggregation on a very large data.

The flow is:-

[
{
"$match": {"field": { "$in" : [a list of around 1k+ elements(one_k_list)] }}
},
{"$group", .....},
{"$group", .....}
]

This is taking a lot of time, but if I divide the list into small chunks of let’s say 50 elements then i can make around 20+ async db calls and get my result. My question is will combining those 20+ db call result give me same output as the one with 1K+ list.

kevinadi · August 18, 2022, 1:20am

Hi @Saujanya_Tiwari welcome to the community!

Apologize for the delay, but have you found a solution for your use case yet?

I tend to think that the double $group is the main issue here. In an aggregation, certain stages like $group or $sort without an index is termed a “blocking” stage. That is, this stage will need all the documents to be present to be able to do its work, and not operate on a document-per-document basis. In other words, it “blocks” the whole pipeline.

will combining those 20+ db call result give me same output as the one with 1K+ list.

That depends on what the $group operation does. If it’s mathematically equal (e.g. if you’re adding or multiplying numbers), then yes. If it’s not equal (e.g. some statistical functions concerning populations), then you’ll need to do more things to make them mathematically equal

Best regards
Kevin

system · October 13, 2022, 2:50am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.