I’ve got a data model where two collections are related as parent/child objects. Each parent maintaining a count of its number of children, and a timestamp of the last time that value changed.
The collections look like this:
/* parents */
{ _id: 1, childCount: 3, modified: ISODate }
{ _id: 2, childCount: 1, modified: ISODate }
{ _id: 3, childCount: 0, modified: ISODate }
/* children */
{ _id: 1, parentId: 1 }
{ _id: 2, parentId: 1 }
{ _id: 3, parentId: 1 }
{ _id: 4, parentId: 2 }
The parent documents must have their counters reconstructed at any time. I’m doing this with an aggregate pipeline on the children collection with a $merge
stage, as follows:
[
{ "$group": {
"_id": "$parentId",
"sum": {"$sum":1}
} },
{ "$merge": {
"on": "_id",
"into": "parents",
"let": {"now":ISODate("..."), "sum":"$sum"},
"whenMatched": [ {
"$set":{
"childCount": "$$sum"
"$modified": "$$now"
}
} ]
} }
]
My first problem is that this won’t update parents that have no children. To get around this, I’ve tried zeroing the whole parent collection first, as follows:
{ "$set": { "childCount":0, "$currentDate":{"modified":true} } }
and THEN I run the merge. It works but it feels like there must be a better way.
My main problem is that this updates all timestamps in the parent’s modified
field. Whereas I only want to update this when the counts have actually changed. I can’t compare the values in the merge
pipeline, because I’ve already zeroed them.
What is a better way to achieve all of this?
Please note that I’m executing all of this from PHP, so I don’t want to pull large arrays of IDs into memory and use them in subsequent queries. I’d been doing this previously.
Also, I’m not asking for a better data model. I’m stuck with this one for now and need to code around the failings.