Stageless aggregation variables

Maximilian_N_A · February 6, 2023, 11:09pm

Hello there community.
I am a student and I have now worked in multiple projects with mongodb and I focused on aggregations since requests became more and more complicated.

But there has been one roadblock that leads to issues over, over, and over again.
I want to make as few db calls as possible, meaning I want a single aggregation to get all the data I want.
We can merge multiple aggregations with $facet, but since this has an actual bytelimit, it doesn’t help in the following example:

I want to retrieve data, but also statistics and counts over lookups and whatsoever.
This leads to me doing $group, get the $size of an array, then $unwind multiple times to flatten some nested arrays, and finally $facet to get another count and data.

What I am trying to say here, is that it would make some aggregations so much easier, if I could instead of a facet, could also store a variable parallel to my aggregations ignored in the stages and only then used when I need it at the end for example.

In short the issue is: I perform actions like $group, then get a calculated variabel or count and $unwinding it all, do a lot more actions on a potentially facet-bytelimit exceeding amount of data forcing me to add the variable to all documents in all stages.

I would love to be able to have an alternative to the $facet, with an even smaller byte limit of that matter, but something where I have a stage with an aggregation that doesn’t change the above stage, but adds a parallel value which I can retrieve in forexample a $project.

Does something like this already exist or could this be implemented in the future?

Sumanta_Mukhopadhyay · February 7, 2023, 12:19am

As per I am aware there’s no alternative to the $facet pipeline stage in MongoDB that allows you to store intermediate values parallel to the aggregation pipeline stages and retrieve them later in the pipeline. This means that you have to work around the bytelimit and design your aggregation pipeline in such a way that it fits within the bytelimit.

You can try to reduce the size of intermediate data by using $group to get the counts and statistics before you $unwind and flatten the nested arrays. This way, you can limit the amount of data that gets $unwound, reducing the size of the intermediate data and helping you stay within the bytelimit.

Alternatively, you can consider using a separate collection to store intermediate values and use a join to retrieve the data in your final pipeline stage.

If you feel that the bytelimit is a limitation to your use case, you can consider opening a feature request on the MongoDB issue tracker. The MongoDB development team will review and consider the request for a future release.

steevej · February 7, 2023, 3:20am

I am not sure I fully understand your use-case but I think that $unionWith could help.

system · February 12, 2023, 3:20am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.