I’m aware of the document versioning pattern when it comes to versioning documents but I can’t seem to find anything official on the MongoDB resources regarding document versioning lots of large documents with frequent changes.
I’ve had a look at the data modelling course on the university site and it doesn’t cover it.
Are there any patterns that can be used to version lots of large documents with frequent changes? Or is MongoDB a bad use case for this?
What do you mean with frequent changes? Does each change create a new version document?
How do you consider keeping the versions? It sounds that if the document is large and change is frequent its best to not embed old versions but create a new document…
Be aware that an updated document is completely written back to permanent storage, even the unmodified values.
So if your document is large and is frequently updated, you might suffer write starvation. In this case, a variation of the outlier pattern might be appropriated if only a few fields of the large documents are frequently updated. You would keep the stable fields in the main large document and store the frequent modifications in a separate outlier document or documents. This would reduce the write starvation since the frequently updated and written parts are much smaller that the stable main large document.
@Pavel_Duchovny Thank you so much for this. I’ve got further clarification of the requirements. Essentially, each time a change is made to a document an entire snapshot needs to be taken of the previous version not the tracking of individual fields as I said previously. There will always be a “main” version of a document and all it’s previous versions each time it changed. There’s a possibility any aspect of the document could be changed.
The queries will be basic i.e. just to retrieve individual documents so it will “byId”
Yes, the requirement is to have a new “main” document so therefore I would expect a newer timestamp for each change.
In that case it sounds like you may have the benefit of splitting the data into a “latest” collection and a history collection.
In the latest collection you will store the most recent version and have it queried and indexed by id.
While the history collection will receive the privious state. So essentially an update is a “transaction” of delete => insert new with same _id => insert old.
If updates are so frequent and the critical path is the write and not the read you may consider the following alternative:
Insert a new version into the main collection keeping the history in the same one (you may offload the history as a batch process)
In this design I Invision the collection as follows: