Performance Issue with $group Pipeline

yoni_egorov · July 23, 2025, 12:50pm

Hi everyone,

We’re using a MongoDB M60 cluster (version 8.0.11), and we have a performance concern with one of our aggregation queries.

We have an orders collection with ~50 million documents, and each document contains a tenantId. Our goal is to identify all UUIDs (grouped orders) for a specific tenant where none of the orders have a status that is considered “completed”.

Here is the aggregation pipeline we use:

{
  $match: {
    tenantId: "MY_TENANT_ID",
    status: { $exists: true }
  }
}

{
  $group: {
    _id: "$uuid",
    condition_count_complated: {
      $sum: {
        $cond: [
          {
            $in: [
              "$status",
              ["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]
            ]
          },
          1,
          0
        ]
      }
    }
  }
}

{
  $match: {
    condition_count_complated: { $eq: 0 }
  }
}

We also have a compound index on:

{ uuid: 1, status: 1 }

Unfortunately, the $group stage is taking the most time, sometimes up to 10–20 minutes, especially for tenants with a large number of orders.

From what I understand, $group doesn’t utilize indexes (and I assume $sort won’t help here either since we’re not using $first or $last).

Main Questions:

Is there any way to optimize this query to reduce the time spent on $group?
Is MongoDB suited for this kind of workload — aggregating across millions of documents for analytics-style queries?
Should we consider a different data model, or maybe offload this to a data warehouse?

Any tips, indexing strategies, or restructuring ideas would be greatly appreciated!

Michael_Nwuju · July 23, 2025, 12:55pm

Hi @yoni_egorov, I just responded to you regarding this issue, did you try it out?