How to Maintain Multiple Versions of a Record in MongoDB (2024 Updates)

John Page6 min read • Published Aug 12, 2024 • Updated Aug 12, 2024

Aggregation Framework MongoDB

Rate this tutorial

Over the years, there have been various methods proposed for versioning data in MongoDB. Versioning data means being able to easily get not just the latest version of a document or documents but also view and query the way the documents were at a given point in time.

There was the blog post from Asya Kamsky written roughly 10 years ago, an update from Paul Done (author of Practical MongoDB Aggregations), and also information on the MongoDB website about the version pattern from 2019.

These variously maintain two distinct collections of data — one with the latest version and one with prior versions or updates, allowing you to reconstruct them.

Since then, however, there have been seismic, low-level changes in MongoDB's update and aggregation capabilities. Here, I will show you a relatively simple way to maintain a document history when updating without maintaining any additional collections.

To do this, we use expressive updates, also sometimes called aggregation pipeline updates. Rather than pass an object with update operators as the second argument to update, things like $push and $set, we express our update as an aggregation pipeline, with an ordered set of changes. By doing this, we can not only make changes but take the previous values of any fields we change and record those in a different field as a history.

The simplest example of this would be to use the following as the update parameter for an updateOne operation.

1 [ { $set : { a: 5 , previous_a: "$a" } }]

This would explicitly set a to 5 but also set previous_a to whatever a was before the update. This would only give us a history look-back of a single change, though.

Before:

1 { 
2   a: 3
3 }

After:

1 {
2   a: 5,
3   previous_a: 3
4 }

What we want to do is take all the fields we change and construct an object with those prior values, then push it into an array — theoretically, like this:

1 [ { $set : { a: 5 , b: 8  }  ,
2     $push : { history : { a:"$a",b:"$b"} } ]

The above does not work because the $push part in bold is an update operator, not aggregation syntax, so it gives a syntax error. What we instead need to do is rewrite push as an array operation, like so:

1 {"$set":{"history":
2        {"$concatArrays":[[{ _updateTime: "$$NOW", a:"$a",b:"$b"}}],
3                          {"$ifNull":["$history",[]]}]}}}

To talk through what's happening here, I want to add an object, { _updateTime: "$$NOW", a:"$a",b:"$b"}, to the array at the beginning. I cannot use $push as that is update syntax and expressive syntax is about generating a document with new versions for fields, effectively, just $set. So I need to set the array to the previous array with nym new value prepended.

We use $concatArrays to join two arrays, so I wrap my single document containing the old values for fields in an array. Then, the new array is my array of one concatenated with the old array.

I use $ifNUll to say if the value previously was null or missing, treat it as an empty array instead, so the first time, it actually does history = [{ _updateTime: "$$NOW", a:"$a",b:"$b"}] + [].

Before:

1 { 
2   a: 3,
3   b: 1
4 }

After:

1 {
2   a: 5,
3   b: 8,
4   history: [
5              { 
6                _updateTime: Date(...),
7                          a: 3, 
8                          b: 1 
9              }
10       ]
11 }

That's a little hard to write but if we actually write out the code to demonstrate this and declare it as separate objects, it should be a lot clearer. The following is a script you can run in the MongoDB shell either by pasting it in or loading it with load("versioning.js").

This code first generates some simple records:

1 // Configure the inspection depth for better readability in output
2 config.set("inspectDepth", 8) // Set mongosh to print nicely
3 
4 // Connect to a specific database
5 db = db.getSiblingDB("version_example")
6 db.data.drop()
7 const nFields = 5
8 
9 // Function to generate random field values based on a specified change percentage
10 function randomFieldValues(percentageToChange) {
11  const fieldVals = new Object();
12  for (let fldNo = 1; fldNo < nFields; fldNo++) {
13    if (Math.random() < (percentageToChange / 100)) {
14      fieldVals[`field_${fldNo}`] = Math.floor(Math.random() * 100)
15    }
16  }
17  return fieldVals
18 }
19 
20 // Loop to create and insert 10 records with random data into the 'data' collection
21 for (let id = 0; id < 10; id++) {
22  const record = randomFieldValues(100)
23  record._id = id
24  record.dateUpdated = new Date()
25  db.data.insertOne(record)
26 }
27 
28 // Log the message indicating the data that will be printed next
29 console.log("ORIGINAL DATA")
30 console.table(db.data.find().toArray())

(index)	_id	field_1	field_2	field_3	field_4	dateUpdated
0	0	34	49	19	74	2024-04-15T13:30:12.788Z
1	1	13	9	43	4	2024-04-15T13:30:12.836Z
2	2	51	30	96	93	2024-04-15T13:30:12.849Z
3	3	29	44	21	85	2024-04-15T13:30:12.860Z
4	4	41	35	15	7	2024-04-15T13:30:12.866Z
5	5	0	85	56	28	2024-04-15T13:30:12.874Z
6	6	85	56	24	78	2024-04-15T13:30:12.883Z
7	7	27	23	96	25	2024-04-15T13:30:12.895Z
8	8	70	40	40	30	2024-04-15T13:30:12.905Z
9	9	69	13	13	9	2024-04-15T13:30:12.914Z

Then, we modify the data recording the history as part of the update operation.

1 const oldTime = new Date()
2 //We can make changes to these without history like so
3 sleep(500);
4 // Making the change and recording the OLD value
5 for (let id = 0; id < 10; id++) {
6  const newValues = randomFieldValues(30)
7  //Check if any changes
8  if (Object.keys(newValues).length) {
9    newValues.dateUpdated = new Date()
10 
11    const previousValues = new Object()
12    for (let fieldName in newValues) {
13      previousValues[fieldName] = `$${fieldName}`
14    }
15 
16    const existingHistory = { $ifNull: ["$history", []] }
17    const history = { $concatArrays: [[previousValues], existingHistory] }
18    newValues.history = history
19 
20    db.data.updateOne({ _id: id }, [{ $set: newValues }])
21  }
22 }
23 
24 console.log("NEW DATA")
25 db.data.find().toArray()

We now have records that look like this — with the current values but also an array reflecting any changes.

1 {
2     _id: 6,
3     field_1: 85,
4     field_2: 3,
5     field_3: 71,
6     field_4: 71,
7     dateUpdated: ISODate('2024-04-15T13:34:31.915Z'),
8     history: [
9       {
10         field_2: 56,
11         field_3: 24,
12         field_4: 78,
13         dateUpdated: ISODate('2024-04-15T13:30:12.883Z')
14       }
15     ]
16   }

We can now use an aggregation pipeline to retrieve any prior version of each document. To do this, we first filter the history to include only changes up to the point in time we want. We then merge them together in order:

1 //Get only history until point required
2 
3 const filterHistory = { $filter: { input: "$history", cond: { $lt: ["$$this.dateUpdated", oldTime] } } }
4 
5 //Merge them together and replace the top level document
6 
7 const applyChanges = { $replaceRoot: { newRoot: { $mergeObjects: { $concatArrays: [["$$ROOT"], { $ifNull: [filterHistory, []] }] } } } }
8 
9 // You can optionally add a $match here but you would normally be better to
10 // $match on the history fields at the start of the pipeline
11 const revertPipeline = [{ $set: { rewoundTO: oldTime } }, applyChanges]
12 
13 //Show results
14 db.data.aggregate(revertPipeline).toArray()

1   {
2     _id: 6,
3     field_1: 85,
4     field_2: 56,
5     field_3: 24,
6     field_4: 78,
7     dateUpdated: ISODate('2024-04-15T13:30:12.883Z'),
8     history: [
9       {
10         field_2: 56,
11         field_3: 24,
12         field_4: 78,
13         dateUpdated: ISODate('2024-04-15T13:30:12.883Z')
14       }
15     ],
16     rewoundTO: ISODate('2024-04-15T13:34:31.262Z')
17   },

This technique came about through discussing the needs of a MongoDB customer. They had exactly this use case to retain both current and history and to be able to query and retrieve any of them without having to maintain a full copy of the document. It is an ideal choice if changes are relatively small. It could also be adapted to only record a history entry if the field value is different, allowing you to compute deltas even when overwriting the whole record.

As a cautionary note, versioning inside a document like this will make the documents larger. It also means an ever-growing array of edits. If you believe there may be hundreds or thousands of changes, this technique is not suitable and the history should be written to a second document using a transaction. To do that, perform the update with findOneAndUpdate and return the fields you are changing from that call to then insert into a history collection.

This isn't intended as a step-by-step tutorial, although you can try the examples above and see how it works. It's one of many sophisticated data modeling

techniques you can use to build high-performance services on MongoDB and MongoDB Atlas. If you have a need for record versioning, you can use this. If not, then perhaps spend a little more time seeing what you can create with the aggregation pipeline, a Turing-complete data processing engine that runs alongside your data, saving you the time and cost of fetching it to the client to process. Learn more about aggregation.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

Schema Performance Evaluation in MongoDB Using PerformanceBench

Apr 02, 2024 | 20 min read

News & Announcements

Learn MongoDB with MongoDB University Free Courses

Sep 30, 2024 | 6 min read

Code Example

Build a Command Line Tool With Swift and MongoDB

Sep 11, 2024 | 13 min read

Article

Handling MongoDB PHP Errors

Aug 28, 2024 | 7 min read

1	[ { $set : { a: 5 , b: 8 } ,
2	$push : { history : { a:"$a",b:"$b"} } ]

1	{"$set":{"history":
2	{"$concatArrays":[[{ _updateTime: "$$NOW", a:"$a",b:"$b"}}],
3	{"$ifNull":["$history",[]]}]}}}

1	{
2	a: 5,
3	b: 8,
4	history: [
5	{
6	_updateTime: Date(...),
7	a: 3,
8	b: 1
9	}
10	]
11	}

1	// Configure the inspection depth for better readability in output
2	config.set("inspectDepth", 8) // Set mongosh to print nicely
3
4	// Connect to a specific database
5	db = db.getSiblingDB("version_example")
6	db.data.drop()
7	const nFields = 5
8
9	// Function to generate random field values based on a specified change percentage
10	function randomFieldValues(percentageToChange) {
11	const fieldVals = new Object();
12	for (let fldNo = 1; fldNo < nFields; fldNo++) {
13	if (Math.random() < (percentageToChange / 100)) {
14	fieldVals[`field_${fldNo}`] = Math.floor(Math.random() * 100)
15	}
16	}
17	return fieldVals
18	}
19
20	// Loop to create and insert 10 records with random data into the 'data' collection
21	for (let id = 0; id < 10; id++) {
22	const record = randomFieldValues(100)
23	record._id = id
24	record.dateUpdated = new Date()
25	db.data.insertOne(record)
26	}
27
28	// Log the message indicating the data that will be printed next
29	console.log("ORIGINAL DATA")
30	console.table(db.data.find().toArray())

1	const oldTime = new Date()
2	//We can make changes to these without history like so
3	sleep(500);
4	// Making the change and recording the OLD value
5	for (let id = 0; id < 10; id++) {
6	const newValues = randomFieldValues(30)
7	//Check if any changes
8	if (Object.keys(newValues).length) {
9	newValues.dateUpdated = new Date()
10
11	const previousValues = new Object()
12	for (let fieldName in newValues) {
13	previousValues[fieldName] = `$${fieldName}`
14	}
15
16	const existingHistory = { $ifNull: ["$history", []] }
17	const history = { $concatArrays: [[previousValues], existingHistory] }
18	newValues.history = history
19
20	db.data.updateOne({ _id: id }, [{ $set: newValues }])
21	}
22	}
23
24	console.log("NEW DATA")
25	db.data.find().toArray()

1	{
2	_id: 6,
3	field_1: 85,
4	field_2: 3,
5	field_3: 71,
6	field_4: 71,
7	dateUpdated: ISODate('2024-04-15T13:34:31.915Z'),
8	history: [
9	{
10	field_2: 56,
11	field_3: 24,
12	field_4: 78,
13	dateUpdated: ISODate('2024-04-15T13:30:12.883Z')
14	}
15	]
16	}

1	//Get only history until point required
2
3	const filterHistory = { $filter: { input: "$history", cond: { $lt: ["$$this.dateUpdated", oldTime] } } }
4
5	//Merge them together and replace the top level document
6
7	const applyChanges = { $replaceRoot: { newRoot: { $mergeObjects: { $concatArrays: [["$$ROOT"], { $ifNull: [filterHistory, []] }] } } } }
8
9	// You can optionally add a $match here but you would normally be better to
10	// $match on the history fields at the start of the pipeline
11	const revertPipeline = [{ $set: { rewoundTO: oldTime } }, applyChanges]
12
13	//Show results
14	db.data.aggregate(revertPipeline).toArray()

MongoDB