Aggregation performance , talking long time in aggregation query

Trying to do some aggregation on mongo for around 700K records but it is taking around 11 sec to execute and process.

My query looks like

   {$group:{_id:{rmuser:"$rmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },

my collection

    "_id" : ObjectId("614b4305805d3a52e89db2ae"),
    "status" : "CALL_ANSWERED",
    "setupDuration" : 2,
    "ringDuration" : 28,
    "talkDuration" : 19,
    "wrapDuration" : 8,
    "callRecordingUrl" : "",
    "isFirst" : false,
    "counsellorId" : "9ac29fa9-896b",
    "leadId" : "d25163cc-9380",
    "userId" : "4.10256E+15",
    "designation" : "CENTRE_HEAD",
    "chUserId" : "nitin.tomar",
    "rmUserId" : "dhaval.gor",
    "zmUserId" : "yog.raj",
    "leadPhone" : "7318184",
    "assignedDate" : NumberLong(1632190287872),
    "activityDate" : NumberLong(1632322293656),
    "assignmentId" : "61493f500c649325d7167626",
    "metadataId" : "61284ade338ba0a636ad5562",
    "lastUpdatedDate" : ISODate("2021-09-22T14:51:48.459Z"),
    "creationTime" : NumberLong(1632322308459),
    "createdBy" : "LeadCallDetails",
    "lastUpdated" : NumberLong(1632322308459),
    "lastUpdatedBy" : "LeadCallDetails",
    "entityState" : "ACTIVE"

It is indexed properly.

Can someone please help and tell me what I am doing wrong.

1 Like

Hi @Akarsh_Jain and welcome in the MongoDB Community :muscle: !

The best possible aggregation here is this one I think:

    '$match': {
      'zmUserId': 'yog.raj'
  }, {
    '$group': {
      '_id': '$rmUserId', 
      'u': {
        '$addToSet': '$assignmentId'
  }, {
    '$set': {
      'count': {
        '$size': '$u'

Which give me this optimal explain plan with the appropriate index: { zmUserId: 1, rmUserId: 1, assignmentId: 1 }:

queryPlanner: {
          namespace: 'test.c',
          indexFilterSet: false,
          parsedQuery: { zmUserId: { '$eq': 'yog.raj' } },
          queryHash: 'E701AD8B',
          planCacheKey: 'DA369CFE',
          maxIndexedOrSolutionsReached: false,
          maxIndexedAndSolutionsReached: false,
          maxScansToExplodeReached: false,
          winningPlan: {
            stage: 'PROJECTION_COVERED',
            transformBy: { assignmentId: 1, rmUserId: 1, _id: 0 },
            inputStage: {
              stage: 'IXSCAN',
              keyPattern: { zmUserId: 1, rmUserId: 1, assignmentId: 1 },
              indexName: 'id1',
              isMultiKey: false,
              multiKeyPaths: { zmUserId: [], rmUserId: [], assignmentId: [] },
              isUnique: false,
              isSparse: false,
              isPartial: false,
              indexVersion: 2,
              direction: 'forward',
              indexBounds: {
                zmUserId: [ '["yog.raj", "yog.raj"]' ],
                rmUserId: [ '[MinKey, MaxKey]' ],
                assignmentId: [ '[MinKey, MaxKey]' ]
          rejectedPlans: []

So if you don’t have the index { zmUserId: 1, rmUserId: 1, assignmentId: 1 }, I would argue that you aren’t indexed correctly.

Unless your $match isn’t very selective (like it’s only removing half of the 700k docs), your perf should be way better than 11sec. Unless the hardware isn’t following…


1 Like

Hi @MaBeuLux88,
Thanks for the reply. I don’t have compound index on the given field, but have individual index of the mentioned fields. Do I need to have the compound index only?
Also currently I am using M20 cluster with 3 nodes, is the hardware fine or you would recommend updating it?

Please also help with optimisation for query without any match.

For example

   {$group:{_id:{zmUserId:"$zmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },

Yes because a compound index is way more efficient than individual indexes and MongoDB isn’t very efficient at doing index intersections because compound indexes are always better so…

No idea as I have no information at all about the use case, all the sizes, the requirements, etc… My best advice: check the metrics. If you see that your RAM, CPU or IOPS are struggling, then it’s probably time to consider an upgrade or consider the cluster autoscaling.

Try the same query with the explain at the end:

{$group:{_id:{zmUserId:"$zmUserId"}, uniqueValues:{$addToSet:"$assignmentId"}} },
], {explain: true})

Try to create an index so you avoid the COLL_SCAN or FETCH stages that cost a lot of IOPS and are slower than IDXSCAN for example. :slight_smile:.