Maximum number of stages in a pipeline, pipeline design

Hey guys,

I may have a pipeline with 2 or 3 $facet stages in every each with $match, $group, $unwind, $addField stages, alltogether easily around 20 or 30 stages,
or I optionally share this one large to let’s say 3-5 other, smaller pipelines
( Pipeline A, Pipeline B, …, Pipeline E ) with a common output collection.

My questions:

  1. what is a best practice max. stage number limit in a pipeline and what are the other considerations?
  2. If I execute by schedule ( in Atlas Triggers ) all the 5 smaller pipelines separately, can they add their outputs to the same output collection without issues, using $out ?
    I mean Pipeline A adds A1, A2, … An key value pairs to that collection,
    Pipeline B adds B1, B2, … Bm key value pairs to that collection,
    so this way will the output collection contain all key value pairs of A1, A2, …, B1, B2, …, Ey, Ez ?
    Any arguments for or against these options or any other useful option is appreciated,
    Thank you!

Hi @Vane_T,

There are limits to the aggregation pipeline:

Moreover, for $facet if the result set is returned or outed somewhere the arrays returned (in a single document) cannot exceed 16mb.

Therefore if you use 4.4 cluster and above you may consider $unionWith to create your documents rather than $facet:

This stage results in each document as a seperate one so only each document is subject to 16mb limit and not the entire result set.

Thanks
Pavel

Hi @Pavel_Duchovny

thank you for your response and links, but they are too broad and theoretical for me to help my particular decision :slightly_frowning_face:

I plan to execute this or these pipeline(s) via scheduled triggered functions, planning to run them by every 5-15 minutes, so - I think - with proper index usage some hundred new documents in source collection should not be an issue in aggregation, should be?

BTW the linked doc doesn’t provide info about recommended or max number of stages in a pipeline.
Thank you!

Hi @Vane_T,

There is no max number of stages per say as long as the entire command document does not cross 16mb.

For shared and Atlas free tier the limit is 50 stages in a pipeline.

Your question is also very broad , if you want my specific opinion you should provide specific queries with specific execution stats/plan.

Thanks
Pavel