Aggregation performance , talking long time in aggregation query

Akarsh_Jain · October 21, 2021, 6:55pm

Trying to do some aggregation on mongo for around 700K records but it is taking around 11 sec to execute and process.

My query looks like

db.getCollection('XYZ').aggregate([
   {$match:{zmUserId:"ABC"}},
   {$group:{_id:{rmuser:"$rmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },
   {$project:{_id:1,count:{$size:"$uniqueValues"}}}
])

my collection

{
    "_id" : ObjectId("614b4305805d3a52e89db2ae"),
    "status" : "CALL_ANSWERED",
    "setupDuration" : 2,
    "ringDuration" : 28,
    "talkDuration" : 19,
    "wrapDuration" : 8,
    "callRecordingUrl" : "",
    "isFirst" : false,
    "counsellorId" : "9ac29fa9-896b",
    "leadId" : "d25163cc-9380",
    "userId" : "4.10256E+15",
    "designation" : "CENTRE_HEAD",
    "chUserId" : "nitin.tomar",
    "rmUserId" : "dhaval.gor",
    "zmUserId" : "yog.raj",
    "leadPhone" : "7318184",
    "assignedDate" : NumberLong(1632190287872),
    "activityDate" : NumberLong(1632322293656),
    "assignmentId" : "61493f500c649325d7167626",
    "metadataId" : "61284ade338ba0a636ad5562",
    "lastUpdatedDate" : ISODate("2021-09-22T14:51:48.459Z"),
    "creationTime" : NumberLong(1632322308459),
    "createdBy" : "LeadCallDetails",
    "lastUpdated" : NumberLong(1632322308459),
    "lastUpdatedBy" : "LeadCallDetails",
    "entityState" : "ACTIVE"
}

It is indexed properly.

Can someone please help and tell me what I am doing wrong.

MaBeuLux88_xxx · October 21, 2021, 7:53pm

Hi @Akarsh_Jain and welcome in the MongoDB Community !

The best possible aggregation here is this one I think:

[
  {
    '$match': {
      'zmUserId': 'yog.raj'
    }
  }, {
    '$group': {
      '_id': '$rmUserId', 
      'u': {
        '$addToSet': '$assignmentId'
      }
    }
  }, {
    '$set': {
      'count': {
        '$size': '$u'
      }
    }
  }
]

Which give me this optimal explain plan with the appropriate index: { zmUserId: 1, rmUserId: 1, assignmentId: 1 }:

queryPlanner: {
          namespace: 'test.c',
          indexFilterSet: false,
          parsedQuery: { zmUserId: { '$eq': 'yog.raj' } },
          queryHash: 'E701AD8B',
          planCacheKey: 'DA369CFE',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'PROJECTION_COVERED',
            transformBy: { assignmentId: 1, rmUserId: 1, _id: 0 },
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: { zmUserId: 1, rmUserId: 1, assignmentId: 1 },
              indexName: 'id1',
              isMultiKey: false,
              multiKeyPaths: { zmUserId: [], rmUserId: [], assignmentId: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                zmUserId: [ '["yog.raj", "yog.raj"]' ],
                rmUserId: [ '[MinKey, MaxKey]' ],
                assignmentId: [ '[MinKey, MaxKey]' ]
              }
            }
          },
          rejectedPlans: []
        }

So if you don’t have the index { zmUserId: 1, rmUserId: 1, assignmentId: 1 }, I would argue that you aren’t indexed correctly.

Unless your $match isn’t very selective (like it’s only removing half of the 700k docs), your perf should be way better than 11sec. Unless the hardware isn’t following…

Cheers,
Maxime.

Akarsh_Jain · October 21, 2021, 8:06pm

Hi @MaBeuLux88_xxx,
Thanks for the reply. I don’t have compound index on the given field, but have individual index of the mentioned fields. Do I need to have the compound index only?
Also currently I am using M20 cluster with 3 nodes, is the hardware fine or you would recommend updating it?

Akarsh_Jain · October 21, 2021, 8:41pm

Hi,
Please also help with optimisation for query without any match.

For example

db.getCollection('LeadCallDetails').aggregate([
   {$group:{_id:{zmUserId:"$zmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },
   {$project:{_id:1,count:{$size:"$uniqueValues"}}}
])

MaBeuLux88_xxx · October 21, 2021, 10:50pm

Yes because a compound index is way more efficient than individual indexes and MongoDB isn’t very efficient at doing index intersections because compound indexes are always better so…

No idea as I have no information at all about the use case, all the sizes, the requirements, etc… My best advice: check the metrics. If you see that your RAM, CPU or IOPS are struggling, then it’s probably time to consider an upgrade or consider the cluster autoscaling.

MaBeuLux88_xxx · October 21, 2021, 10:53pm

Try the same query with the explain at the end:

db.getCollection(‘LeadCallDetails’).aggregate([
{$group:{_id:{zmUserId:"$zmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },
{$project:{_id:1,count:{$size:"$uniqueValues"}}}
], {explain: true})

Try to create an index so you avoid the COLL_SCAN or FETCH stages that cost a lot of IOPS and are slower than IDXSCAN for example. .

Cheers,
Maxime.