$group stage - a wasted opportunity

Every single time I need to use $group I find myself perplexed by how limited and wasteful this stage is in an otherwise really powerful pipeline system.

Why is it that $group MUST destroy whatever was garnered in previous stages? Why was it not possible to design it to be akin to how $unwind works, i.e. why can’t $group store its results onto a new field without ruining whatever was aggregated in previous stages? This way a user would have a way to preserve their fields, and most importantly - easily perform chained $group stages, and if they desired to obtain the shape that $group does today, all they would need to do is throw in a $project stage at the end.

What should one do to aggregate data with fields that have multiple $group'ings on them? Suppose I have delivery status data:

[
{courier: "John Brown", productType: "Package", status: "DELIVERED"},
{courier: "John Brown", productType: "Insured Parcel", status: "DELIVERED"},
{courier: "John Brown", productType: "Package", status: "DELIVERY_RESCHEDULED"},
{courier: "Eve White", productType: "Bubble Mailer", status: "DELIVERY_FAILED"}
]

How do I run $group in order to obtain the following:

Group by courier and and then group by delivery status:

For instance:

{
data: [
{
  courier: "John Brown",
  products: [
  {
    productType: "Package",
    status: "DELIVERED",
    count: 45
  },
  {
    productType: "Package",
    status: "DELIVERY_RESCHEDULED",
    count: 2,
  },
  {
    productType: "Insured Parcel",
    status: "DELIVERED",
    count: 21,
  }
  ]
  },
  {
    courier: "Eve White",
    products: [
    <...this courier's listings follow the same data structure...>
  ]}
]

It must because it groups multiple documents into groups.

Because not single way will fit all the use cases. But you may store the documents you want using thing like $first, $last together with $$ROOT.

Your first $group _id will be { “courier”:“$courier” , “productType”:“$productType” , “status”:“$status” }. You would then $group on the groups with _id:{“courier”:“$_id.courier”,“productType”:“$_id.productType”} and finally you $group on _id:{ “courier”:“$_id.courier”}. Not exactly like you shown but with a little bit more information because could also get a count per productType.

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.