Is it possible to for some aggregation operators to be used within the same stage?

Every stage I have seen so far have the same pattern:

{ $aggrop: {...parameters...}}

That is a stage is a JSON object with a single field. However JSON object can have many fields. And I wonder if it’s possible use multiple aggregation operators within the same stage.
E.g.

{ $aggrop0: {...parameters...}, $aggrop1: {...parameters...}, $aggrop2: {...parameters...}, ... }

Of course, only in cases when there is no ambiguity.

I.e. write one stage like this:

{
  $sort: {...},
  $skip: S,
  $limit: N
}

instead of three separate stages:

{ $sort: {...} },
{ $skip: S },
{ $limit: N }

This would reduce the number of stages in pipeline. Moreover some aggregation operators seems to make sense (or typically used) with others.

Thanks,
Dmytro

Hi @Dmytro_Sheyko

Stages are applied to documents and operators to fields. So, in my view, the answer is no.

We can’t merge stages, or create new custom stages, but it would be interesting.

But there are some already implementing what you suggest, like $groupByAndCount (I can’t remember the exact name, but it is similar.)

The reverse is possible though (i.e split a pipeline) using $facet.

1 Like

Thank you @santimir,
Perhaps you meant $sortByCount, which is $group + $sort.
As for user defined custom stages, you anticipated my next question. Too bad that the answer is “no”.

Hi @Dmytro_Sheyko ,

Interesting question!!

The aggregation operations can be considered as the pipeline through which the documents are flowing. Now the consecutive stages work on the set/shape of the documents that are returned from the previous stage.

Consider the scenario where you have the following documents:

{ _id: 1, user_name: "John", department: "Biology", score: 87},
{ _id: 2, user_name: "Harry", department: "Physics", score: 60},
{ _id: 3, user_name: "Roger", department: "Biology", score: 44},
{ _id: 5, user_name: "Jenny", department: "History", score: 82},
{ _id: 6, user_name: "Srivi", department: "Biology", score: 78}, 
{ _id: 7, user_name: "Tom", department: "History", score: 80}

Now if we want to get the average passing score of users in each of the departments (passing marks = 75), we will run the following aggregation query:

db.user_data.aggregate ( [
         { $match: { score: {$gt: 75} }},  
         {$group: { _id: "$department", average_passing_mark: { $avg: "$score" } }} 
])

In this aggregation pipeline, all the documents will pass through the $match stage and as a result, only the documents with _id: 1, 5, 6, and 7 are returned and then these documents are passed to the next $group stage.

The idea of having multiple stages is to isolate the operations that we are are going to perform on the doocuments consecutively. Within each stage, we can use these aggregation operators to construct expressions.

I hope it helps!

Please feel free to reach out if you have any questions.

Kind Regards,
Sonali

5 Likes

I would not worry too much about the sheer number of stages. There are some optimizations to the pipeline applied before evaluation. In your case it’s likely https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/#-sort----limit-coalescence will kick in.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.