How to Maintain Multiple Versions of a Record in MongoDB (2024 Updates)
Rate this tutorial
Over the years, there have been various methods proposed for versioning data in MongoDB. Versioning data means being able to easily get not just the latest version of a document or documents but also view and query the way the documents were at a given point in time.
There was the blog post from Asya Kamsky written roughly 10 years ago, an update from Paul Done (author of Practical MongoDB Aggregations), and also information on the MongoDB website about the version pattern from 2019.
These variously maintain two distinct collections of data — one with the latest version and one with prior versions or updates, allowing you to reconstruct them.
Since then, however, there have been seismic, low-level changes in MongoDB's update and aggregation capabilities. Here, I will show you a relatively simple way to maintain a document history when updating without maintaining any additional collections.
To do this, we use expressive updates, also sometimes called aggregation pipeline updates. Rather than pass an object with update operators as the second argument to update, things like $push and $set, we express our update as an aggregation pipeline, with an ordered set of changes. By doing this, we can not only make changes but take the previous values of any fields we change and record those in a different field as a history.
The simplest example of this would be to use the following as the update parameter for an updateOne operation.
This would explicitly set
a
to 5 but also set previous_a
to whatever a
was before the update. This would only give us a history look-back of a single change, though.Before:
After:
What we want to do is take all the fields we change and construct an object with those prior values, then push it into an array — theoretically, like this:
The above does not work because the $push part in bold is an update operator, not aggregation syntax, so it gives a syntax error. What we instead need to do is rewrite push as an array operation, like so:
To talk through what's happening here, I want to add an object,
{ _updateTime: "$$NOW", a:"$a",b:"$b"}
, to the array at the beginning. I cannot use $push as that is update syntax and expressive syntax is about generating a document with new versions for fields, effectively, just $set. So I need to set the array to the previous array with nym new value prepended.We use $concatArrays to join two arrays, so I wrap my single document containing the old values for fields in an array. Then, the new array is my array of one concatenated with the old array.
I use $ifNUll to say if the value previously was null or missing, treat it as an empty array instead, so the first time, it actually does
history = [{ _updateTime: "$$NOW", a:"$a",b:"$b"}] + []
.Before:
After:
That's a little hard to write but if we actually write out the code to demonstrate this and declare it as separate objects, it should be a lot clearer. The following is a script you can run in the MongoDB shell either by pasting it in or loading it with
load("versioning.js")
.This code first generates some simple records:
(index) | _id | field_1 | field_2 | field_3 | field_4 | dateUpdated |
---|---|---|---|---|---|---|
0 | 0 | 34 | 49 | 19 | 74 | 2024-04-15T13:30:12.788Z |
1 | 1 | 13 | 9 | 43 | 4 | 2024-04-15T13:30:12.836Z |
2 | 2 | 51 | 30 | 96 | 93 | 2024-04-15T13:30:12.849Z |
3 | 3 | 29 | 44 | 21 | 85 | 2024-04-15T13:30:12.860Z |
4 | 4 | 41 | 35 | 15 | 7 | 2024-04-15T13:30:12.866Z |
5 | 5 | 0 | 85 | 56 | 28 | 2024-04-15T13:30:12.874Z |
6 | 6 | 85 | 56 | 24 | 78 | 2024-04-15T13:30:12.883Z |
7 | 7 | 27 | 23 | 96 | 25 | 2024-04-15T13:30:12.895Z |
8 | 8 | 70 | 40 | 40 | 30 | 2024-04-15T13:30:12.905Z |
9 | 9 | 69 | 13 | 13 | 9 | 2024-04-15T13:30:12.914Z |
Then, we modify the data recording the history as part of the update operation.
We now have records that look like this — with the current values but also an array reflecting any changes.
We can now use an aggregation pipeline to retrieve any prior version of each document. To do this, we first filter the history to include only changes up to the point in time we want. We then merge them together in order:
This technique came about through discussing the needs of a MongoDB customer. They had exactly this use case to retain both current and history and to be able to query and retrieve any of them without having to maintain a full copy of the document. It is an ideal choice if changes are relatively small. It could also be adapted to only record a history entry if the field value is different, allowing you to compute deltas even when overwriting the whole record.
As a cautionary note, versioning inside a document like this will make the documents larger. It also means an ever-growing array of edits. If you believe there may be hundreds or thousands of changes, this technique is not suitable and the history should be written to a second document using a transaction. To do that, perform the update with findOneAndUpdate and return the fields you are changing from that call to then insert into a history collection.
This isn't intended as a step-by-step tutorial, although you can try the examples above and see how it works. It's one of many sophisticated data modeling
techniques you can use to build high-performance services on MongoDB and MongoDB Atlas. If you have a need for record versioning, you can use this. If not, then perhaps spend a little more time seeing what you can create with the aggregation pipeline, a Turing-complete data processing engine that runs alongside your data, saving you the time and cost of fetching it to the client to process. Learn more about aggregation.
Top Comments in Forums
There are no comments on this article yet.