$count vs $project -> $group w/ $sum

Given an aggregation pipeline that is intended to count the resulting number of records, is it more efficient to use a $count stage or a $project into a $group using a $sum aggregator?

I see in the v6.0 docs for $count under “Behavior”, there is a note that $count is the same as doing this:

{ $group: { _id: null, count: { $sum: 1 } } },
{ $project: { _id: 0 } }

but I’m curious if preceding either a $count or the interchangeable [ $group w/ $sum$project ] with

{ $project: { _id: 1 } }

would be more efficient.

I don’t have a good way to test this since I don’t have a sufficiently large data set available to see a meaningful difference, but if anyone has a way to test this or knows already I’d love to find out which is better.

Hi :wave: @Lucas_Burns,

Welcome to the MongoDB Community forums :sparkles:

I have performed 4 separate queries on a sample collection containing 1,600,000 documents, and here are the results of the execution:

[
  {
    $project: {
      _id: 1,
    },
  },
  {
    $group: {
      _id: null,
      count: {
        $sum: 1,
      },
    },
  },
  {
    $project: {
      _id: 0,
    },
  },
]

it returned:

1st Case

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 1600000,
          "executionTimeMillis": 1677,
          "totalKeysExamined": 0,
          "totalDocsExamined": 1600000,
          "executionStages": {
            "stage": "PROJECTION_SIMPLE",
            "nReturned": 1600000,
            "executionTimeMillisEstimate": 111,
            "works": 1600002,
            "advanced": 1600000,
            "needTime": 1,
            "needYield": 0,
            "saveState": 1663,
            "restoreState": 1663,
            "isEOF": 1,
            "transformBy": {
              "_id": true
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "nReturned": 1600000,
              "executionTimeMillisEstimate": 71,
              "works": 1600002,
              "advanced": 1600000,
              "needTime": 1,
              "needYield": 0,
              "saveState": 1663,
              "restoreState": 1663,
              "isEOF": 1,
              "direction": "forward",
              "docsExamined": 1600000
            }
          },
          "allPlansExecution": []
        }
      },
      "nReturned": 1600000,
      "executionTimeMillisEstimate": 1391
    },
  ],
  "command": {
    "pipeline": [
      {
        "$project": {
          "_id": 1
        }
      },
      {
        "$group": {
          "_id": null,
          "count": {
            "$sum": 1
          }
        }
      },
      {
        "$project": {
          "_id": 0
        }
      }
    ],
  },
  "ok": 1,
}

And similarly, without $project: {_id: 1} as the first stage, it returned the following execution time:

[
  {
    $group: {
      _id: null,
      count: {
        $sum: 1,
      },
    },
  },
  {
    $project: {
      _id: 0,
    },
  },
]

2nd Case

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 1600000,
          "executionTimeMillis": 940,
          "totalKeysExamined": 0,
          "totalDocsExamined": 1600000,
          "executionStages": {
            "stage": "COLLSCAN",
            "nReturned": 1600000,
            "executionTimeMillisEstimate": 63,
            "works": 1600002,
            "advanced": 1600000,
            "needTime": 1,
            "needYield": 0,
            "saveState": 1601,
            "restoreState": 1601,
            "isEOF": 1,
            "direction": "forward",
            "docsExamined": 1600000
          },
          "allPlansExecution": []
        }
      },
      "nReturned": 1600000,
      "executionTimeMillisEstimate": 736
    },
  ],
  "command": {
    "pipeline": [
      {
        "$group": {
          "_id": null,
          "count": {
            "$sum": 1
          }
        }
      },
      {
        "$project": {
          "_id": 0
        }
      }
    ],
  },
  "ok": 1,
}

And the following query:

[
  {
    $project: { _id: 1 },
  },
  {
    $count: "mycount",
  },
]

it returned:

3rd Case

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 1600000,
          "executionTimeMillis": 1717,
          "totalKeysExamined": 0,
          "totalDocsExamined": 1600000,
          "executionStages": {
            "stage": "PROJECTION_SIMPLE",
            "nReturned": 1600000,
            "executionTimeMillisEstimate": 134,
            "works": 1600002,
            "advanced": 1600000,
            "needTime": 1,
            "needYield": 0,
            "saveState": 1665,
            "restoreState": 1665,
            "isEOF": 1,
            "transformBy": {
              "_id": true
            },
            "inputStage": {
              "stage": "COLLSCAN",
              "nReturned": 1600000,
              "executionTimeMillisEstimate": 93,
              "works": 1600002,
              "advanced": 1600000,
              "needTime": 1,
              "needYield": 0,
              "saveState": 1665,
              "restoreState": 1665,
              "isEOF": 1,
              "direction": "forward",
              "docsExamined": 1600000
            }
          },
          "allPlansExecution": []
        }
      },
      "nReturned": 1600000,
      "executionTimeMillisEstimate": 1408
    },
  ],
  "command": {
    "pipeline": [
      {
        "$project": {
          "_id": 1
        }
      },
      {
        "$count": "mycount"
      }
    ],
  },
  "ok": 1,
}

And similarly, without $project: {_id: 1} as the first stage, it returned the following execution time:

[
  {
    $count: "mycount",
  },
]

4th Case

{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "executionStats": {
          "executionSuccess": true,
          "nReturned": 1600000,
          "executionTimeMillis": 924,
          "totalKeysExamined": 0,
          "totalDocsExamined": 1600000,
          "executionStages": {
            "stage": "COLLSCAN",
            "nReturned": 1600000,
            "executionTimeMillisEstimate": 63,
            "works": 1600002,
            "advanced": 1600000,
            "needTime": 1,
            "needYield": 0,
            "saveState": 1601,
            "restoreState": 1601,
            "isEOF": 1,
            "direction": "forward",
            "docsExamined": 1600000
          },
          "allPlansExecution": []
        }
      },
      "nReturned": 1600000,
      "executionTimeMillisEstimate": 754
  ],
  "command": {
    "pipeline": [
      {
        "$count": "mycount"
      }
    ],
  },
  "ok": 1,
}

The above operation has been done on MongoDB Atlas M0 version: "5.0.15"

Notice the executionTimeMillis and executionTimeMillisEstimate for all 4 cases:

Case executionTimeMillis executionTimeMillisEstimate $project: {_id: 1}
1st Case 1677 ms 111 ms Yes
2nd Case 940 ms 63 ms No
3rd Case 1717 ms 134 ms Yes
4th Case 924 ms 63 ms No

I hope this makes it clear that there is a difference in the efficiency of the query without $project: {_id: 1}. The query runs faster without $project: {_id: 1}.

I would suggest you experiment with different collection scenarios. You can use mgeneratejs to create sample documents quickly in any number, so the different aggregation pipelines can be tested easily.

Best,
Kushagra

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.