Help with very slow $group aggregation, on very small dataset

Alex_Bjorlig · January 28, 2023, 1:48pm

Our team is struggling to understand this very slow query, please help.

The data looks like this:

[
  {groupId: "ABC", riskModelComplianceType: "1", costEstimate: 200},
  {groupId: "ABC", riskModelComplianceType: "1", costEstimate: 200},
  {groupId: "ABC", riskModelComplianceType: "2", costEstimate: 100},
  {groupId: "ABC", riskModelComplianceType: "3": costEstimate: 400},
]

Our aggregation looks like this:

[
  {
    $match: {
      groupId: "5cb5cba6d5815e780d12c13f",
    },
  },
  {
    $group: {
      _id: "$riskModelComplianceType",
      costEstimate: {
        $sum: "$costEstimate",
      },
    },
  },
]

The pipeline “works” but the performance is terrible. We only have 17K docs, but the above query takes around 4 seconds

What is going on here? Do we need a certain index or why is it so slow?

I have attached the explain output from Mongodb.
explain-from-mongo.json (58.3 KB)

Satyam · February 2, 2023, 3:50am

Hey @Alex_Bjorlig,

Looking at your explain output, the ordering of the index being used by the query planner is:

{
              "groupId": -1,
              "factoryId": -1,
              "riskModelCategoryLevel": -1,
              "eddyLabelIds": -1,
              "status": -1,
              "responsible": -1
            }

which indicates that you have more fields in your document than what the sample document you provided is showing. Would you be able to provide the full document structure along with any other information so that we can reproduce it better on our end as I have attempted to use 17K sample documents & the same pipeline mentioned in this post which resulted in a 8ms execution. It would also be great if you can share your db.collection.stats() output along with details of your hardware: RAM, CPU, Disk specs, etc for us to be better able to understand and help you.

Also, from reading your explain output, there is a FETCH stage too in the output. It is usually recommended to avoid this as much as possible and filter as much as possible using IXSCAN. You can try adding a $sort stage before the $group so the documents in the input of the $group state are already sorted which could help improve the performance. Please see: $group optimization. Also see covered query

Regards,
Satyam

system · May 22, 2023, 4:14am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.