$filter and $set in aggregation to update matching array sub-document

Aneesh_Deshmukh · March 14, 2024, 3:59pm

Is there anyway i can update only the matching element of the array sub-document with the data from the fields in the local document??

So, This is the schema i am working with.

{
    "history":  [{ fieldA: Date, fieldB: Number, fieldC: ObjectId }],
    "current": {
        "fieldD": { type: ObjectId },
        "fieldE": { type: Number },
    }
}

I am trying to push a new entry to history such that the new entry is

new_entry = { fieldA: new Date, fieldB: current.fieldE, fieldC: current.fieldD }

and if before the document was

{
    "history":   [
      { fieldA: ISODate("2024-02-18"), fieldB: 6, fieldC: ObjectID("old_id") }, 
      { fieldA: ISODate("2024-02-25"), fieldB: 6, fieldC: ObjectID("old_id_2") }, 
      { fieldA: ISODate("2024-03-03"), fieldB: 6, fieldC: ObjectID("old_id_3") }
    ],
    "current": {
        "fieldD": ObjectId("new_id"),
        "fieldE": 8,
    }
}

then after the operations

{
    "history":   [
      { fieldA: ISODate("2024-02-18"), fieldB: 6, fieldC: ObjectID("old_id") }, 
      { fieldA: ISODate("2024-02-25"), fieldB: 6, fieldC: ObjectID("old_id_2") }, 
      { fieldA: ISODate("2024-03-03"), fieldB: 6, fieldC: ObjectID("old_id_3") }, 
      { fieldA: ISODate("2024-03-11"), fieldB: 8, fieldC: ObjectId("new_id") }
    ],
    "current": {
        "fieldD": ObjectId("new_id"),
        "fieldE": 8,
    }
}

I have been using $concatArrays but the query is taking around 12minutes to run each time. I was trying to reduce this by using $push and $set aggregation like so

let new_date= new Date()
db.collection.updateMany({ "current.fieldD": { $gte: 10, $lte: 23 } }, { $push: { "history": { $each: [{ fieldA: new_date}], $slice: -10 } } });

db.collection.updateMany({ "current.fieldD": { $gte: 10, $lte: 23 } }, [
  { $project: { history: { $filter: { input: "$history", as: "history", cond: { $eq: ["$$history.fieldA", new_date] } } }, "current.fieldD": 1, "current.fieldE": 1 }},
  { $set: { "history.fieldC": "$current.fieldD", "history.fieldB": "$current.fieldE" } }  
]

But this seems to set the history for all matching documents as an array of just one object element resulting in a document like

{
    "history":   [
      { fieldA: ISODate("2024-03-11"), fieldB: 8, fieldC: ObjectId("new_id") }
    ],
    "current": {
        "fieldD": ObjectId("new_id"),
        "fieldE": 8,
    }
}

Is there anyway to update only the matching subdocument with the data from the root document?

John_Sewell · March 14, 2024, 5:01pm

We actually do something similar when updating document, we have a history section that when updating we push the old and new values to, along with the date and information about why the change was performed.

We’re running on pretty large collections and don’t see an overhead of doing this, we’re using a $concat with an $ifNull to catch the base case of the array not existing.

Our query looks something like this:

db.getCollection('myCol').updateMany(
	{
		"LookupKey":"MyKeyValueToFilter"
	},
	[
		{
			"$set":{
				"history":{
					"$concatArrays":[
						{
							"$ifNull":[
								"$history",
								[]
							]
						},
						[
							{
								"Change":"ChangeID",
								"ScriptName":"ScriptName",
								"FieldUpdated":"myField",
								"OldValue":"$myField",
								"NewValue":"theNewValue"
							}
						]
					]
				},
				"myField":"theNewValue"
			}
		}
	]	
)

Akshat_Gupta3 · March 14, 2024, 6:35pm

Instead of using $concatArrays you can directly use $push . Below is the sample code -

db.myCol.updateMany(
    { "LookupKey":"MyKeyValueToFilter" },
    {
        $push: {
            history: {
                fieldA: new Date(),
                fieldB: "$current.fieldE",
                fieldC: "$current.fieldD"            
            }
        }
    }
);

For More information you can also refer the documentation - https://www.mongodb.com/docs/manual/reference/operator/update/push/

steevej · March 15, 2024, 12:59pm

This $push code using $current seems to be incomplete. I get

{ _id: ObjectId("65f4429e136981b442cfa9a0"),
  history: 
   [ { fieldA: 2024-03-15T12:44:38.339Z,
       fieldB: '$current.fieldE',
       fieldC: '$current.fieldD' } ] }

To refer to the a field using $ you need to use the update with aggregation syntax

Aneesh_Deshmukh · March 15, 2024, 2:39pm

I am currently using this method too but i have two concerns:

Query time might increase as the array increases
I do not need a whole history for my application. But with $concatArrays i cannot limit the size of this array.

Aneesh_Deshmukh · March 15, 2024, 2:40pm

I have tried this and this doesnt work as also confirmed by @steevej . $push blindly pushed the data without checking for the data for $

steevej · March 15, 2024, 3:02pm

In situations that looks like yours I do.

Always create document with an empty history, this way I never have to test for null.
The current field is not present but is the $last of the history.
Sometimes, to make some use-case more efficient, current field is present and duplicates the data of the $last of history.
In cases, where history might be big, I move the history in a separate collection and only keep the current in the main collection.

John_Sewell · March 18, 2024, 8:49am

As usual, Steevej comes up with great points, our history is not large, the largest we have is a few dozen entries on the most changed data. If you were looking at a lot of changes to lots of records then I’d also agree on moving to a new collection but it depends on what you want to use the history for and how much changes.
You could always just keep all data and when it changes, mark the old record as deleted and write a new one.
An alternative which we looked at was an Atlas trigger to monitor and deal with it then, that may offload some of the work so the changes can be tracked in the background, the pre-image that’s available now means you can do lots of things with the data and as it run on a change stream it can just trundle away in the background.