Aggreation project and group

db.icecream_data.aggregate([ {$group: {_id:null, max_high: {$max:"$trends.avg_high_tmp"}}}])

returns a list of result

 _id: null,
  max_high: [ 42, 44, 53, 64, 75, 83, 87, 84, 78, 67, 55, 45 ] }

db.icecream_data.aggregate([ {$project: {_id:0, max_high: {$max:"$trends.avg_high_tmp"}}}])

returns a single result

{ _id: null, max_high: 87 }

There is only 1 document in the ice-cream, trends is an array containing 12 sub-documents

I understood why project returns only 1 value as it checks the max in the array. Why do the group query returns a list of values

1 Like

Hey @churamani_prasad,

Welcome to the MongoDB Community Forums! :leaves:

$group is used to group input documents by the specified _id expression and for each distinct grouping, outputs a document. If you look at the dataset using the following command:

db.icecream_data.findOne()

we get:

{
  _id: ObjectId("59bff494f70ff89cacc36f90"),
  trends: [
    {
      month: 'January',
      avg_high_tmp: 42,
      avg_low_tmp: 27,
      icecream_cpi: 238.8,
      icecream_sales_in_millions: 115
    },
    {
      month: 'February',
      avg_high_tmp: 44,
      avg_low_tmp: 28,
      icecream_cpi: 225.5,
      icecream_sales_in_millions: 118
    },
    {
      month: 'March',
      avg_high_tmp: 53,
      avg_low_tmp: 35,
      icecream_cpi: 221.9,
      icecream_sales_in_millions: 121
    },
    {
      month: 'April',
      avg_high_tmp: 64,
      avg_low_tmp: 44,
      icecream_cpi: 222.6,
      icecream_sales_in_millions: 125
    },
    {
      month: 'May',
      avg_high_tmp: 75,
      avg_low_tmp: 54,
      icecream_cpi: 216.7,
      icecream_sales_in_millions: 140
    },
    {
      month: 'June',
      avg_high_tmp: 83,
      avg_low_tmp: 63,
      icecream_cpi: 216.6,
      icecream_sales_in_millions: 155
    },
    {
      month: 'July',
      avg_high_tmp: 87,
      avg_low_tmp: 68,
      icecream_cpi: 213.2,
      icecream_sales_in_millions: 163
    },
    {
      month: 'August',
      avg_high_tmp: 84,
      avg_low_tmp: 66,
      icecream_cpi: 215.9,
      icecream_sales_in_millions: 157
    },
    {
      month: 'September',
      avg_high_tmp: 78,
      avg_low_tmp: 59,
      icecream_cpi: 217.4,
      icecream_sales_in_millions: 140
    },
    {
      month: 'October',
      avg_high_tmp: 67,
      avg_low_tmp: 48,
      icecream_cpi: 218.7,
      icecream_sales_in_millions: 128
    },
    {
      month: 'November',
      avg_high_tmp: 55,
      avg_low_tmp: 38,
      icecream_cpi: 220.3,
      icecream_sales_in_millions: 122
    },
    {
      month: 'December',
      avg_high_tmp: 45,
      avg_low_tmp: 29,
      icecream_cpi: 227.7,
      icecream_sales_in_millions: 117
    }
  ]
}

As we can see and correctly pointed out by you, trends contain multiple subdocuments. Now, all these subdocuments contain only one value of avg_high_tmp, which is why when you are going for $group, it returns a list of values from each subdocument. Do check out more examples.

Let me know if the doubt still persists. Feel free to reach out for anything else as well.

Regards,
Satyam

1 Like

Actually, I couldn’t understand. In the query I never said to group by month rather I put _id:null. I expected it to give me the max_high across all months and not for individual months.

All the examples in the link are where group works across documents and there are no array of sub-documents. This ice-cream example is the only one where grouping is done across sub-documents. I’m unable to wrap my head on what would be the corresponding SQL query in RDBMS for this use case. I inserted another document with Jan month max temp 43 and ran the same query and I get the similar result ```

{ _id: null,
  max_high: [ 43, 44, 53, 64, 75, 83, 87, 84, 78, 67, 55, 45 ] }

If I want grouping at month level, shouldn’t I run the below query and get output as shown below

db.icecream_data.aggregate([ {$group: {_id:"$trends.month", max_high: {$max:"$trends.avg_high_tmp"}}}])
{ _id: 
   [ 'January',
     'February',
     'March',
     'April',
     'May',
     'June',
     'July',
     'August',
     'September',
     'October',
     'November',
     'December' ],
  max_high: [ **43**, 44, 53, 64, 75, 83, 87, 84, 78, 67, 55, 45 ] }

Hey @churamani_prasad,

This would have happened if trends did not contain multiple subdocuments. Since there’s only one max_high in each subdocument, it’s just grouping that for each subdocument and returning the values. The other query that you wrote is also doing the same thing.

I’m also linking another documentation for you that should help: SQL to Aggregation Mapping Chart. It contains more examples as well on $group stage.

Let us know if any doubts still persist.

Regards,
Satyam