Do aggregations impact examined:returned ratio? Should they?

Kevin_Rathgeber · January 20, 2022, 4:39pm

For example, if I write an aggregation to query some collection, and then average a specific field, it would technically only return 1 document, but could look at N number of objects. Would that examined:returned ratio be N all the time then?

Paulo_David · April 28, 2022, 3:04pm

I have the similar case, where I have and aggregation and use group to get the values of the month of a date range. This is more unpredictable, but it’s very similar to your situation.
We need something more specific to evaluate the performance of aggregations, the current one isn’t very accurate for this case.

Jason_Tran · April 29, 2022, 3:58am

Hi @Kevin_Rathgeber and @Paulo_David,

Would that examined:returned ratio be N all the time then?

5 sample documents in my test environment:

numberdb> db.numbercoll.find()
[
  { _id: ObjectId("626b4cbb5d078c11bef0fc23"), a: 1 },
  { _id: ObjectId("626b4cbd5d078c11bef0fc24"), a: 2 },
  { _id: ObjectId("626b4cbe5d078c11bef0fc25"), a: 3 },
  { _id: ObjectId("626b4cc15d078c11bef0fc26"), a: 4 },
  { _id: ObjectId("626b4cc25d078c11bef0fc27"), a: 5 }
]

Performing an aggregation to get the average value of a for all documents:

numberdb> db.numbercoll.aggregate({$group:{_id:null,aAverage:{"$avg":"$a"}}})
[ { _id: null, aAverage: 3 } ]

Running a db.collection.explain("executionStats").aggregate(...) on the above aggregation. The execution stats output is as follows:

        executionStats: {
          executionSuccess: true,
          nReturned: 5,
          executionTimeMillis: 0,
          totalKeysExamined: 0,
          totalDocsExamined: 5,
          executionStages: {
            stage: 'PROJECTION_SIMPLE',
            nReturned: 5,
            executionTimeMillisEstimate: 0,
            works: 7,
            advanced: 5,
            needTime: 1,
            needYield: 0,
            saveState: 1,
            restoreState: 1,
            isEOF: 1,
            transformBy: { a: 1, _id: 0 },
            inputStage: {
              stage: 'COLLSCAN',
              nReturned: 5,
              executionTimeMillisEstimate: 0,
              works: 7,
              advanced: 5,
              needTime: 1,
              needYield: 0,
              saveState: 1,
              restoreState: 1,
              isEOF: 1,
              direction: 'forward',
              docsExamined: 5
            }
          }
        }

As you can see from the above data, only a singular document is returned in the shell output when running the aggregation to display the average a value as 3. However, based off the execution stats output, we can see that multiple documents were needed to be examined and processed to determine this average. More specifically:
nReturned: 5 (Number of documents that match the query condition)
totalDocsExamined: 5 (Number of documents examined during query execution)

Would that examined:returned ratio be N all the time then?

I believe it may depend on what you mean by “examined” and “returned” here. As you can see from the above, totalDocsExamined is 5 and nReturned is 5 (ratio of 5:5) but the output of the aggregation is a single document. If you are referring to the totalDocsExamined to the single output document of the aggregation, then it would be N for this aggregation example.

We need something more specific to evaluate the performance of aggregations, the current one isn’t very accurate for this case.

Does the above example provide any help or insight for your use case? If not, it may be best to create a new topic with more information such as:

The aggregation being run
The use case
The issue with performance you’re encountering
MongoDB version

Depending on your pipeline, you may be able to create indexes for a covered query to avoid document fetches.

Regards,
Jason