Grouping documents into documents + subdocuments

Rami_C · December 21, 2023, 10:35am

I have millions of documents with the following structure:

        timestamp: 2023-06-21T05:38:10.384+00:00
        score: 3
        name: "John Smith" 
        _id: 656f7c44d804487c00618f1c
        provider: "MCD"

        timestamp: 2023-06-21T05:38:10.384+00:00
        score: 10
        name: "John D" 
        _id: 656f7c44d804487c11618f1c
        provider: "DDK"

There are many score, name and provider values for each timestamp (~100 distinct time stamps, ~10 distinct providers, ~1000 distinct names). I would like to do an aggregation in a way that will provide me one document per timestamp and provider, in the following structure:

        timestamp: 2023-06-21T05:38:10.384+00:00
        provider: "MCD"                
        name_score: <name, score>, <name, score>, ...

        timestamp: 2023-06-21T05:38:10.384+00:00
        provider: "DDK"                
        name_score: <name, score>, <name, score>, ...

That is, the number of documents would be (number of timestamps) * (number of providers).

I was able to group by timestamp using the following pipeline:

pipeline = [
            {
                "$group": {
                    "_id": {"timestamp": "$timestamp"},
                    "my_col": {
                            "$push": {
                            "name": "$name",
                            "score": "$score",
                            "provider": "$provider",
                            }}
                }
            },
            ]

but I was not able to get the second part. I am working with PyMongo. Any help will be appreciated.

steevej · December 21, 2023, 1:52pm

The secret about $group is the _id. When we take your problem statement:

it directly maps to

"_id" : { "timestamp" : "$timestamp" , "provider": "$provider } ,

Sometimes it is more easy than we think.

system · December 26, 2023, 1:53pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.