I’m using MongoDB 4.2 for now and use it with a NodeJs application with Mongoose 6.0.14.
I’ve created an aggregate (see below) to retrieve some data and it has a strange behavior. With the base aggregate too many documents are filtered and I receive an error 4031700 (document constructed by $facet is 104857724 bytes, which exceeds the limit of 104857600 bytes) on the $facet stage.
But if I add an empty $addFields stage before my $lookup, there is no more error. In fact, I can retrieve about 10 times more documents with the base query filter in the $match stage.
This behavior is the same when replacing the $addFields by a $sort or a $project with all fields within documents.
So my question is : Do aggregates have a specific behavior with $addFields, $sort or $project concerning the way the data is stored for later stages ?
Your pipeline seems to be highly redacted so it is hard to make a real assessment of what is going on.
It is really hard to imagine that increasing the data would reduce the data of the $facet. May be some detrimental optimization is performed without the empty $addFields. A $project that eliminates data, yes. A $sort may be, as some data might be piped to the next stage faster.
Do you get the error if you only $match and $unwind? That is without any other stages in the $facet pipeline.
Doing $unwind inside the $facet might be useless and the culprit as it multiplies the data. You do not need to $unwind even if localField refers to the array. The $lookup is smart enough to get all the elements of the array.
Having $facet with only 1 field is useless, but again you might have more but the redacted pipeline you shared stops us from doing a real assessment.
I understand that not having a lot of details about the aggregate make it hard to understand, but I’m using it in a professional context and so I cannot shared sensitive information, including the schemas. However, I’ve just noticed that I forgot a stage in the upper aggregate (another $unwind) and it should looks like :
With only the $match and first $unwind, I have no issue. I think this is the second $unwind which create a lot of data (all of them are unique). The fact is even if I $project all existing fields (so there is theoretically no eliminated data) or I use $sort or $addFields, I have no longer any problem. Maybe MongoDB is doing some weird operations to transform data and change the way it is memory stored in one case, what do you think ?