Performance Issue when using $group Pipeline

Hi @yoni_egorov, so I took some time to understand what’s going on here, I did some research and yes you are correct, the $group operator does not use indexes. However, I think there’s a better approach to this.

In the $group stage, I can see that you are still filtering, which is slowing things down, I think you should move this extra condition to the first $match stage so this makes the query faster.

Plus, because the $match stage is basically a .find(), your indexes will work in the $match stage.

So the aggregation pipeline should look like this -

  1. $match
{
    tenantId: "MY_TENANT_ID",
    status: {
      $in: ["WAITING_FOR_APPROVAL", "APPROVED", "SENT"]
}
  1. $group
{
    _id: "$uuid"
}

This gets you the uuid of the documents you are really trying to find, then you can get all uuids by the tenant like this -

db.collection.distinct("uuid", { tenantId: "MY_TENANT_ID" })

Lastly, you subtract the first aggregation results from the collection.distinct() result to give you that difference you are looking for, I hope this helps.

1 Like