Performance Issue when using $group Pipeline

yoni_egorov · July 23, 2025, 9:22am

Hey there, we’re using M60 cluster ( 8.0.11 ), and we have one query that needs to run on all orders from a specific “tenant” ( you basically have multiple orders per tenant ).

Now the collection size is 50M documents, and the aggregation that I’m performing is:

$match:

{
  tenantId: "MY_TENANT_ID",
  status: {$exists: 1}
}

$group

{
  _id: "$uuid",
  condition_count_complated: {
    $sum: {
      $cond: [
        {
          $in: [
            "$status",
            [
              "WAITING_FOR_APPROVAL",
              "APPROVED",
              "SENT"
            ]
          ]
        },
        1,
        0
      ]
    }
  }
}

$match

{
  condition_count_complated: {
    $eq: 0
  }
}

now most of the query execution time is the second pipeline stage, the $group.

I know that $group doesn’t use indexes ( even though uuid: 1, status: 1 index exists ).
Adding $sort before that will not help as I don’t use any accumulators (like $first / $last ) inside the $group.

The avg execution time for this query is about 10-20 min ( depending on which tenant id we matching at first, if one tenant has alot of orders that it will take more time ).

I would love to get some opinions here of how I can improve this result?
Does MongoDB is even being used for such operations?

Michael_Nwuju · July 23, 2025, 12:49pm

Hi @yoni_egorov, so I took some time to understand what’s going on here, I did some research and yes you are correct, the $group operator does not use indexes. However, I think there’s a better approach to this.

In the $group stage, I can see that you are still filtering, which is slowing things down, I think you should move this extra condition to the first $match stage so this makes the query faster.

Plus, because the $match stage is basically a .find(), your indexes will work in the $match stage.

So the aggregation pipeline should look like this -

$match

{
    tenantId: "MY_TENANT_ID",
    status: {
      $in: ["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]
}

$group

{
    _id: "$uuid"
}

This gets you the uuid of the documents you are really trying to find, then you can get all uuids by the tenant like this -

db.collection.distinct("uuid", { tenantId: "MY_TENANT_ID" })

Lastly, you subtract the first aggregation results from the collection.distinct() result to give you that difference you are looking for, I hope this helps.

aneroid · July 24, 2025, 4:27am

Additional to Michael’s answer above, since you have your last step has:

{
  condition_count_complated: {
    $eq: 0
  }
}

The 0 is for status NOT IN ["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]. You can use $nin operator for that.

$nin selects the documents where:

the specified field value is not in the specified array or

the specified field does not exist.

So the first $match stage could directly be:

{
    tenantId: "MY_TENANT_ID",
    status: {
      $nin: ["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]
}

Mongo Playground