Optimisation in mongodb command , Fetch is taking too long

I have a cluster i.e M10 .
Having a collections which currenctly have 1.6m records.
Fetch Is really slow making my aggregation execution time more than 20-30 secs

[
  {
    $project:
      /**
       * specifications: The fields to
       *   include or exclude.
       */
      {
        organizationId: 1,
        startTime: 1,
        timeSpent: 1,
        userRole: 1,
        isGuest: 1,
        userId: 1,
      },
  },
  {
    $match:
      /**
       * query: The query in MQL.
       */
      {
        organizationId: 3277,
        userRole: "student",
        startTime: {
          $gt: ISODate("2024-04-23"),
          $lt: ISODate("2024-05-23"),
        },
      },
  },
  {
    $group:
      /**
       * _id: The id of the group.
       * fieldN: The first field name.
       */
      {
        _id: "$userId",
        totalVal: {
          $sum: "$timeSpent",
        },
        sessions: {
          $sum: 1,
        },
      },
  },
  {
    $sort:
      /**
       * Provide any number of field/order pairs.
       */
      {
        totalVal: -1,
      },
  },
  {
    $limit:
      /**
       * Provide the number of documents to limit.
       */
      10,
  },
]

Above is the command i run lets say

Explain gives data like this

{
 "stage": "IXSCAN",
 "planNodeId": 1,
 "nReturned": 503081,
 "executionTimeMillisEstimate": 1624,
 "opens": 1,
 "closes": 1,
 "saveState": 1279,
 "restoreState": 1279,
 "isEOF": 1,
 "numTested": 1,
 "thenBranchOpens": 1,
 "thenBranchCloses": 1,
 "elseBranchOpens": 0,
 "elseBranchCloses": 0,
 "filter": "s25 ",
 "thenSlots": [
  5,
  7,
  8,
  9,
  10
 ],
 "elseSlots": [
  11,
  17,
  18,
  19,
  10
 ],
 "outputSlots": [
  20,
  21,
  22,
  23,
  24
 ],
 "keyPattern": {
  "organizationId": 1,
  "startTime": 1,
  "userRole": 1
 },
 "indexName": "organizationId_1_startTime_1_userRole_1",
 "isMultiKey": false,
 "multiKeyPaths": {
  "organizationId": [],
  "startTime": [],
  "userRole": []
 },
 "isUnique": false,
 "isSparse": false,
 "isPartial": false,
 "indexVersion": 2,
 "direction": "forward",
 "indexBounds": {
  "organizationId": [
   "[3277, 3277]"
  ],
  "startTime": [
   "(new Date(1713830400000), new Date(1716422400000))"
  ],
  "userRole": [
   "[\"student\", \"student\"]"
  ]
 },
 "docsExamined": 0,
 "keysExamined": 0
}

And fetch →

{
 "stage": "FETCH",
 "planNodeId": 2,
 "nReturned": 503081,
 "executionTimeMillisEstimate": 20426,
 "opens": 1,
 "closes": 1,
 "saveState": 1279,
 "restoreState": 1279,
 "isEOF": 1,
 "totalDocsExamined": 503081,
 "totalKeysExamined": 610016,
 "collectionScans": 0,
 "collectionSeeks": 503081,
 "indexScans": 0,
 "indexSeeks": 1,
 "indexesUsed": [
  "organizationId_1_startTime_1_userRole_1",
  "organizationId_1_startTime_1_userRole_1"
 ],
 "innerOpens": 503081,
 "innerCloses": 1,
 "outerProjects": [],
 "outerCorrelated": [
  20,
  21,
  22,
  23,
  24
 ],
 "docsExamined": 503081,
 "keysExamined": 0
}

Do i need to go to M30 or something else .
My another concern here is it is examining around 503081 records . Can there be a better implementation which may decrease this such that it will examine less records

The $match should be first.

The $project is useless and potentially harmful, especially since it comes before $match. The logic is that you potentially $project (thus fetch) all the documents before filtering them. I wrote potentially as mongo might optimized this out.

You have an IXSCAN but your index organizationId_1_startTime_1_userRole_1 is not optimal for this query. Since you are doing an equality match on userRole and range query on startTime a better index would organizationId_1_userRole_1_startTime_1.

Hey Steevej

My main query has $match as the first only , i was just experimenting to keep project first to check if it works .

Regarding index , okay maybe changing index it might make IXSCAN faster , but here the issue is FEtch , even if i make IXSCAN faster it will change only 0.5 sec at max but how to make FETCH faster

HI Gouri

Have you checked the metrics in your cluster? You can change the time of the view to match your query time. If its outside of the range of the log retention just run your queries again. I am not sure which provider you use, I use Azure with M10’s as my dev/test environment and I get 120IOPS on the disks. With cloud infra you need to determine how many IOPS (and also Read/Write Units) you are consuming and then change your quota’s from there. Overutilisation can lead to slow performance which may not be a concern of the actual queries or data involved.

Hope that helps

Craig.

Please do not make us read, analyze and write about piece of code you already know is not correct.

Have you tried it? Please try before disregarding.

You could try to $project _id:0,userId:1timeSpent:1 after adding userId and timeSpent as the last fields of your index.

@Gouri_Sankar, out of courtesy to people that replied to your post, please do not let it die without a final solution or followups.