MongoDb - meaning of pipeline aggregation 100mb limit?

Asya_Kamsky · March 19, 2021, 8:53pm

I wish our docs were more clear on this - there is an explanation of this in one of my aggregation talks but it can be hard to dig out. Note that the docs do not say the pipeline is limited to 100MBs, it’s a single stage that’s limited.

Think of the pipeline as a stream of documents - each stage takes in documents and then it outputs documents.

Some stages cannot output any documents until they have accepted all incoming documents. The most obvious example is $group. (1)

That first $group you have needs to see all the documents coming in from the collection. But it only needs to output N documents, where N is the number of distinct product values in the collection. The size of original documents does not matter. The only thing that matters is the size of the “grouped” documents (the ones coming out of this stage), and each of them is just:

{ "_id": { "group_id": "XXXXX"}, "quantity": N }

That’s maybe 53-60 bytes, depending on how long your product field is. So to exceed 100MBs you would need approximately 1.7 million distinct products. More if you remove the sub-object from the _id.

I hope you can see that your aggregation will not fail due to 100MBs limit here. All the other stages in the pipeline are “streaming” stages - meaning the batches of documents stream through them and don’t have to be accumulated all at once.

Asya

(1) the other example is $sort when it’s not supported by an index, and hence causes an in-memory sort.