How to - complex (for me) aggregation pipeline

Hey guys,
I have ecommerce orders and I’d like to aggregate data by
customer_id,
storing aggregated data by customer_id AND by order_status,
and in different calculated fields the count of that specific status of

  1. orders
  2. unique orders
    so a result document would look like:
    {
    "customer_id": “Someone”,
    "countAllOrders": 34,
    "countAllUniqueOrders": 28,
    "countCompletedOrders": 12,
    "countPendingOrders": 22,
    }

I use $group for grouping orders by customer_id, I know how to calculate “countAllOrders”:
countAllOrders: { $sum: 1 },
but I was not able to figure out how can I calculate:
“countAllUniqueOrders” and “countCompletedOrders”.
In my case the expression for
“countAllUniqueOrders” is:
filter by the given customer_id and count the distnict values of order_ids related,
in case of
“countCompletedOrders” is:
filter by the given customer_id and where order.status = “completed” and count them.

I was trying $group, $bucket and $facet, but the latest 2 is not appropriate: I can’t set the number of buckets in advance and I want to output every filtered documents, sub-stages ( $facet doesn’t allow it AFAIK ) to separate collections.

Basically how should I modify
{ $sum: 1 }
in my case?
I tried like this:
countCompletedOrders: { $sum: { $cond: { if: $eq: [ "$order.status", "completed" ], then: 1, else: 0 } } },
but it drops an “Unknown error” like others
Thank you!

The following

and

indicate that you want to start with

{
    $group :
    {
        _id :
        {
            "customer" : "$customer_id" ,
            "status" :  "$order.status" 
        } ,
        "count_by_customer_and_status" : { "$sum" : 1 }
    }
}

You will not get exactly what you want but it should then be possible to group all documents of a customer into a single one with $push.

2 Likes

Hi @steevej,

Thank you for your response! :slight_smile:

I’m sorry, I’m a noob, I don’t get your concept. I tried it but it doesn’t provide what I want.
Also, the stage output preview documents only show the unique status values but not the unique customer_id values, so it looks it is NOT grouped by customer_id ( I checked it )
EDIT:
I modified line:
"customer" : "$customer_id" ,
to:
"customer_x" : "$customer_id" ,
and it now groups by both!
Pls. note I DON’T have such a field named “customer” !

But thank you again! :slight_smile:

It is always better to share a sample of the input documents so that we can test what we propose as solution. Otherwise, we simply give pointers because creating the input documents is time consuming and time is a limited resource.

1 Like

It gives the unique combination of the 2.

You can then group all the status counts of one customer with another group such as:

{
		"$group" : {
			"_id" : "$_id.customer_x",
			"counts" : {
				"$push" : {
					"status" : "$_id.status",
					"count" : "$count"
				}
			}
		}
	}
1 Like

@steevej

thank you, it works :slight_smile:

For others who also need such help, the correct code in the second $group is:

{
    "$group" : {
    	"_id" : "$_id.customer_x",
    	"counts" : {
    		"$push" : {
    			"status" : "$_id.status",
    			"count" : "$count_by_customer_and_status"
    		}
    	}
    }
}

Yes. I made a typo. It is

vs

As a side note, put your code and sample document between 2 lines of triple back ticks. This forum will keep your formatting, highlight the syntax, and more importantly, will produce the correct quotes, double quotes and you will not lose some dollar signs in the rendered page.

In the previous post _id.customer_x and _id.status really needs the dollar sign but we do not see them in the rendered page. So the real final (hopefully) pipeline is:

group_by_customer_and_status =
{
    $group :
    {
        _id :
        {
            "customer_x" : "$customer_id" ,
            "status" :  "$order.status" 
        } ,
        "count_by_customer_and_status" : { "$sum" : 1 }
    }
} ;
group_by_customer =
{
    "$group" :
    {
        _id" : "$_id.customer_x",
        "counts" :
        {
            "$push" :
            {
                "status" : "$_id.status",
                "count" : "$count_by_customer_and_status"
            }
        }
    }
} ;
pipeline = [ group_by_customer_and_status , group_by_customer ] ;
1 Like

That works like a charm!
I added a $unwind and a $addFields stages at the end of that and the aggregations are now in distinct documents, with some further aggregations

Hi @steevej

tried to tweak your suggestion and I have 2 problems,

  1. is totally beginner: in your two $group stage in some cases you use “” around “_id”, and “$group”, in some cases you don’t. Why is the difference?
  2. I try to use $facet stage after a $group stage. I can not export the pipeline in Compass properly, the $facet stage has errors and that is not shown in the exported version
    This $group stage is basically an extended version of your first $gropup stage:
    $group: {
        _id: {
            "email": "$order.billing.email",
            "globalOrderId": "$global.order_id",
            "status": "$order.status"
        },
        "countEach": {
            "$sum": 1
        }
    }

My source document example is:

{
  "_id": {
    "$oid": "5ffb36c87f9b06d538fa78a3"
  },
  "order": {
    "billing": {
      "email": "buyer_3@example.com",
      "phone": "+420 296 631 111"
    },
    "id": {
      "$numberInt": "1465"
    },
    "status": "pending",
    "currency": "EUR",
    "date_created": {
      "$date": {
        "$numberLong": "1610298684000"
      }
    },
    "date_modified": {
      "$date": {
        "$numberLong": "1610299059000"
      }
    },
    "total": {
      "$numberDouble": "140.94"
    },
    "customer_id": {
      "$numberInt": "7"
    },
    "customer_ip_address": null,
    "customer_user_agent": null,
    "customer_note": "VA customer note",
    "payment_method": "other",
    "date_paid": null,
    "date_completed": null
  },
  "numVerify": {
    "valid": true,
    "international_format": "+420296631111",
    "country_prefix": "+420",
    "country_code": "CZ",
    "location": "Praha"
  },
  "businessMeta": {
    "client_id": "0001",
    "client_name": "Acme Co",
    "webshop_id": "00001",
    "webshop_name": "exampleshop.com"
  },
  "global": {
    "order_id": "00001-0000-1465",
    "customer_id": "00001-00001-7"
  }
}

since I added one more items to { _id } in first $group stage, I want to use $facet instead of 2nd $group, when the different sub-pipelines contains the different $groups ( similarly to you 2nd $group stage).

My issue is I got a syntax error and I can not figure out what is wrong.
The content in the 2nd stage, in the $ facet block is:

/**
 * outputFieldN: The first output field.
 * stageN: The first aggregation stage.
 */
{
  "groupedByEmail":
  [
    { $match: { "_id.email": { $exists: 1 } } },
    { $group: {
        _id: "$_id.email",
        "counts_ByEmail":
        {
          $push:
          {
            "globalOrderId": "$_id.globalOrderId",
            "status": "$_id.status",
            "count_ByEmail": "$countEach"
          }
        }
    }
    }
  ],
  "groupedByGOrderId":
  [
    { $match: { "$_id.globalOrderId": { $exists: 1 } } },
    { $group: {
        _id: "$_id.globalOrderId",
        "counts_ByGOrderId":
        {
          $push:
          {
            "email": "$_id.email",
            "status": "$_id.status",
            "count_ByGOrderId": "$countEach"
          }
        }
    }
    }
  ],
  "groupedByStatus":
  [
    { $match: { "$_id.email": { $exists: 1 } } },
    { $group: {
        _id: "$_id.status",
        "counts_ByStatus":
        {
          $push:
          {
            "globalOrderId": "$_id.globalOrderId",
            "status": "$_id.status",
            "count_ByStatus": "$countEach"
          }
        }
    }
    }
  ]
}

EDIT: The latest error message is “unknown top level operator: $_id.globalOrderId” -EDIT END
EDIT 2: When I commented out all 3 $match stages, it works properly, so there must be the problem! -EDIT 2 END
So what is wrong with it and you might noticed that I extended your solution from grouping by 2 items to by 3, so how would you do it instead of mine?
Thanks a lot!

Cut-n-paste error from my part. I always use double quotes around literals.

In the $match, you should not use the dollar sign for the field names _id.email and _id.globalOrderId.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.