How to update a parent collection with counts of related documents?

I’ve got a data model where two collections are related as parent/child objects. Each parent maintaining a count of its number of children, and a timestamp of the last time that value changed.

The collections look like this:

/* parents */
{ _id: 1, childCount: 3, modified: ISODate }
{ _id: 2, childCount: 1, modified: ISODate }
{ _id: 3, childCount: 0, modified: ISODate }

/* children */
{ _id: 1, parentId: 1 }
{ _id: 2, parentId: 1 }
{ _id: 3, parentId: 1 }
{ _id: 4, parentId: 2 }

The parent documents must have their counters reconstructed at any time. I’m doing this with an aggregate pipeline on the children collection with a $merge stage, as follows:

[ 
	{ "$group": {
		"_id": "$parentId",
		"sum": {"$sum":1}
	} },
	{ "$merge": {
		"on": "_id",
		"into": "parents",
		"let": {"now":ISODate("..."), "sum":"$sum"},
		"whenMatched": [ {
			"$set":{
				"childCount": "$$sum"
				"$modified": "$$now"
			}
		} ]
	} }
]

My first problem is that this won’t update parents that have no children. To get around this, I’ve tried zeroing the whole parent collection first, as follows:

{ "$set": { "childCount":0, "$currentDate":{"modified":true} } }

and THEN I run the merge. It works but it feels like there must be a better way.

My main problem is that this updates all timestamps in the parent’s modified field. Whereas I only want to update this when the counts have actually changed. I can’t compare the values in the merge pipeline, because I’ve already zeroed them.

What is a better way to achieve all of this?


Please note that I’m executing all of this from PHP, so I don’t want to pull large arrays of IDs into memory and use them in subsequent queries. I’d been doing this previously.

Also, I’m not asking for a better data model. I’m stuck with this one for now and need to code around the failings.

Hi @timw and welcome to MongoDB community forums!!

Based on the above posts, if I understand correctly, you wish to update the parents collection by setting up new values for childCount and modified, if there is an entry being made to the children collection ?

Considering that the data model could not be changed, it would be great to call the update function on every insert into the child collection and set the value for the required field. using the forEach in PHP.
Once the merge collection is created,

db.parents.find().forEach(function(parent) {
  var childCount = db.children.count({ parentId: parent._id });
  if (parent.childCount !== childCount) {
    db.parents.update(
      { _id: parent._id },
      {
        $set: {
          childCount: childCount,
          modified: new Date()
        }
      }
    );
  }
});

to update the required field values.

If the above aggregation query works for you, the recommendation would be to evaluate on all performance factor that you would like to consider for the application.

Please feel free to reach out in case of further questions.

Regards
Aasawari

Thanks for the reply. The reason I’m not doing it this way is that there could be tens of thousands of updates when rebuilding counters for the whole data set.

Often I can reduce the number of parents to be rebuilt, but when the initial query for that reduction would be a large array of ObjectIds, I opt for a full rebuild instead. I figured the $merge approach would be more performant than this kind of iteration, but it’s fair to say that I haven’t compared performance between the two methods. I will look at doing so, but really my post here was hoping for a best practice technique that I didn’t know about.