$group stage - a wasted opportunity

Vladimir · April 24, 2023, 4:27pm

Every single time I need to use $group I find myself perplexed by how limited and wasteful this stage is in an otherwise really powerful pipeline system.

Why is it that $group MUST destroy whatever was garnered in previous stages? Why was it not possible to design it to be akin to how $unwind works, i.e. why can’t $group store its results onto a new field without ruining whatever was aggregated in previous stages? This way a user would have a way to preserve their fields, and most importantly - easily perform chained $group stages, and if they desired to obtain the shape that $group does today, all they would need to do is throw in a $project stage at the end.

What should one do to aggregate data with fields that have multiple $group'ings on them? Suppose I have delivery status data:

[
{courier: "John Brown", productType: "Package", status: "DELIVERED"},
{courier: "John Brown", productType: "Insured Parcel", status: "DELIVERED"},
{courier: "John Brown", productType: "Package", status: "DELIVERY_RESCHEDULED"},
{courier: "Eve White", productType: "Bubble Mailer", status: "DELIVERY_FAILED"}
]

How do I run $group in order to obtain the following:

Group by courier and and then group by delivery status:

For instance:

{
data: [
{
  courier: "John Brown",
  products: [
  {
    productType: "Package",
    status: "DELIVERED",
    count: 45
  },
  {
    productType: "Package",
    status: "DELIVERY_RESCHEDULED",
    count: 2,
  },
  {
    productType: "Insured Parcel",
    status: "DELIVERED",
    count: 21,
  }
  ]
  },
  {
    courier: "Eve White",
    products: [
    <...this courier's listings follow the same data structure...>
  ]}
]

steevej · April 24, 2023, 9:49pm

It must because it groups multiple documents into groups.

Because not single way will fit all the use cases. But you may store the documents you want using thing like $first, $last together with $$ROOT.

Your first $group _id will be { “courier”:“$courier” , “productType”:“$productType” , “status”:“$status” }. You would then $group on the groups with _id:{“courier”:“$_id.courier”,“productType”:“$_id.productType”} and finally you $group on _id:{ “courier”:“$_id.courier”}. Not exactly like you shown but with a little bit more information because could also get a count per productType.

system · April 29, 2023, 9:50pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.