I have millions of documents with the following structure:
timestamp: 2023-06-21T05:38:10.384+00:00
score: 3
name: "John Smith"
_id: 656f7c44d804487c00618f1c
provider: "MCD"
timestamp: 2023-06-21T05:38:10.384+00:00
score: 10
name: "John D"
_id: 656f7c44d804487c11618f1c
provider: "DDK"
There are many score, name and provider values for each timestamp (~100 distinct time stamps, ~10 distinct providers, ~1000 distinct names). I would like to do an aggregation in a way that will provide me one document per timestamp and provider, in the following structure:
timestamp: 2023-06-21T05:38:10.384+00:00
provider: "MCD"
name_score: <name, score>, <name, score>, ...
timestamp: 2023-06-21T05:38:10.384+00:00
provider: "DDK"
name_score: <name, score>, <name, score>, ...
That is, the number of documents would be (number of timestamps) * (number of providers).
I was able to group by timestamp using the following pipeline:
pipeline = [
{
"$group": {
"_id": {"timestamp": "$timestamp"},
"my_col": {
"$push": {
"name": "$name",
"score": "$score",
"provider": "$provider",
}}
}
},
]
but I was not able to get the second part. I am working with PyMongo. Any help will be appreciated.