Fastest way to get total count for pagination, for an aggregation pipeline

KW_N_A · August 19, 2024, 11:18am

Hello community,

I have 2 collections with the following schema

Customers

_id
…

Orders

_id
customer_id
…

The data is modeled as two collections because the number of orders that a customer can have is unbounded, hence I am unable to store orders as an embedded array under Customers.

In the admin dashboard which I am creating, I would like to show a paginated result for the Customers.

For each customer row, I would like to display the number of orders which the customer has.
I want to display the number of pages which the admin can scroll through. This requires me to know the total number of customers

To get the number of orders for each customer, and also the total number of customers, I would need to perform an aggregation lookup and count.

        {
            "$lookup": {
                "from": "orders",
                "localField": "_id",
                "foreignField": "customer_id",
                "as": "orders",
            }
        },
        {
            "$count": "total_count"
        }

I prepared a toy dataset with 1 mil customers, where each customer has 1 order (hence 1 mil orders as well).

I am unable to complete the query above as the query experiences timeout.

These are the options I have considered but none of them seems ideal:

The option explained above
Use facets to perform the aggregation and get the count. But the community is advising against this, as the performance is bad as well
Avoid using aggregation by storing the order count in the Customers record. But this would mean that if the customer has a new order, I would need to update the Customers record as well
Do not show the total number of customers on the dashboard. But this could result in a bad experience for the admin
Do not display the total order count for each Customer row. I could explore this, but the admin might need this information.

Just wondering if anyone has a better idea? Thanks!

Vivekanandan_Sakthivelu · August 19, 2024, 12:43pm

@KW_N_A ,

Option 3 would simplify the problem .
Better option is to de-normalize the data and store total order in the customer collection , as when order is created .

emmanuel_bernard · August 20, 2024, 7:39am

Hi @KW_N_A ,

You can use $group

db.Orders.aggregate([
{
$group: {
_id: “$customer_id”,
total_count: {$sum:1}
}
}
])

https://mongoplayground.net/p/fDd26mnAHDL

If you need to retrieve information from the Customers collection, you can add a $lookup

steevej · August 20, 2024, 2:50pm

The computed pattern may be of interest for such slowing counts as

About the following

This is essentially the computed pattern. Yes you need to update the computed field. But you have to ask yourself how often a customer has a new invoice compared to how often you need the number of invoices and the time and code complexity of updating the 2 collections vs dynamically computing the number every time you want to display it.

You may even used an hybrid where you keep old invoices (previous years or months) and dynamically compute for recent ones.

KW_N_A · August 21, 2024, 9:56pm

Hmm with a lookup, it will face the same performance issue as option #1 (which I have illustrated).

KW_N_A · August 21, 2024, 9:56pm

Thank you, I will consider this option

KW_N_A · August 24, 2024, 6:36am

Ok, thanks Steeve! Understand that it is a valid pattern. Will consider this option too.

emmanuel_bernard · August 25, 2024, 9:52am

Hi @KW_N_A,

My approach starts with using the $group stage in an aggregation pipeline on the Orders collection to calculate the total order count for each customer_id.
It will reduce the number of documents.

If you need additional informations about each customer, such as their first name or last name, you can add a $lookup stage to join with the previous stage.

This solution differs from option #1.