How can I group a sum group in an aggregation?

Here is my toy example of fruits and vegetables which mimics a side project I’m using to learn MongoDB. I’m using pymongo, so please forgive me if my aggregation is over-quoted.

I need to perform an aggregation where I count number of records for a given date, grouped by another field. In my toy example I can count by date, but I can’t figure out how to split this into the two sub-groups of ‘fruits’ and ‘vegetables’.

I have this:

index _id count
0 2022-02-13 3
1 2022-02-15 3
2 2022-02-17 1

And I need something like this:

index _id count
fruits 2022-02-13 1
fruits 2022-02-15 2
vegetables 2022-02-13 2
vegetables 2022-02-15 1
vegetables 2022-02-17 1
data = [{
    "_id" : "001",
    "Location" : "NY",
    "Time" : "2022-02-13T16:01:00Z",
    "Type" : "fruits"
},
{
    "_id" : "002",
    "Location" : "NY",
    "Time" : "2022-02-13T16:02:00Z",
    "Type" : "vegetables"
},
{
    "_id" : "003",
    "Location" : "NY",
    "Time" : "2022-02-13T16:03:00Z",
    "Type" : "vegetables"
},
{
    "_id" : "004",
    "Location" : "NY",
    "Time" : "2022-02-15T16:04:00Z",
    "Type" : "fruits"
},
{
    "_id" : "005",
    "Location" : "NY",
    "Time" : "2022-02-15T16:05:00Z",
    "Type" : "fruits"
},
{
    "_id" : "006",
    "Location" : "NJ",
    "Time" : "2022-02-15T16:06:00Z",
    "Type" : "fruits"
}]

result = db.COLL_NAME.aggregate([
            {
            '$match': {
                'Location': 'NY'
                }
            },
            {
                '$project': {
                    'MyTime': {
                        '$dateToString' : {
                            'format': '%Y-%m-%d',
                            'date': {
                                 '$convert': {
                                    'input' : '$Time',
                                    'to' : 'date'
                                }
                            }
                        }
                    },
                }
            },
            { 
                '$group': { 
                    '_id': '$MyTime', 
                    'count': {'$sum': 1 }
                } 
            }

])

Hello @xtian_simon, since you need to group by Type field also, you will need to project that field also. So, include the Type field in the $project stage (you can project multiple fields, to exclude or include). Then try, grouping by the two fields - as you want . You can group by more than one field, and in this case the Type and the derived MyTime fields.

1 Like

Thank you. That was helpful.

Here is what I ended up with,

result = db.COLL_NAME.aggregate([
            {
            '$match': {
                'Location': 'NY'
                }
            },
            {
                '$project': {
                    "Type" : 1,
                    'MyTime': {
                        '$dateToString' : {
                            'format': '%Y-%m-%d',
                            'date': {
                                 '$convert': {
                                    'input' : '$Time',
                                    'to' : 'date'
                                }
                            }
                        }
                    },
                }
            },
            { 
                '$group': { 
                    '_id': {'Type' : '$Type', 'MyTime':'$MyTime'}, 
                    'count': {'$sum': 1 }
                },
            }
        ])
1 Like

You don’t need the $project stage at all. Why not just $group after you do the $match? You can do the appropriate calculation for MyTime right there inside $group.

Asya

"You don’t need the $project stage at all. "Why not just $group after you do the $match ? "

Ok. I was using project, because my real example (not this toy example) has many fields and I only need a few.

“You can do the appropriate calculation for MyTime right there inside $group .”

Is that not what’s happening with count: {sum…}? Please be more specific if you’re referring to a different pattern.

You don’t need $project before $group pretty much ever. Since $group defines which fields it needs you do not need to “tell” the pipeline anything else - it will already figure out which fields are necessary.