Best Practices for Update Sensitive Properties

Hello, World!

ASSUMPTIONS

Assume that,

  • We have a DB with collections called orders and products.
  • The documents in the orders (will be called as order from now on) collection contains references from the documents in the products (will be called as product from now on) collection, as an array of object IDs.
  • The documents in the products collection has properties called name and price.
    • name is an update insensitive property. (e.g. if a reference is given to the product, when the product is updated, it does not affect the order.)
    • price is an update sensitive property. (e.g. if a reference is given to the product, when the product is updated, it causes a problem for the order. you lost the price of the product at the time of order is given.)

Also assume that we have the following sample data,

products
{_id: "p0", name: "apple", "price": 200}
{_id: "p1", name: "orange", "price": 500}

orders
{_id:"o0", orderNumber: 1, products: ["p0", "p1"]}

Assume that, the product apple’s price will be increased from 200 to 300, and orange’s price will be increased from 500 to 600.

products (after the update)
{_id: "p0", name: "apple", "price": 300}
{_id: "p1", name: "orange", "price": 600}

PROBLEM

  • When the time the order is created, we sold apple from 200 and orange from 500.
  • However after the update, it seems like we sold apple from 300 and orange from 600 which is totally incorrect.

Obviously, we can create another collection for keeping that change. However, when we have collections with documents that have a lot of properties we never know which property will be update sensitive in the future…

To overcome this problem, I find 2 solutions.

1- You hardcode the update sensitive data inside.

products
{_id: "p0", name: "apple", "price": 200}
{_id: "p1", name: "orange", "price": 500}

orders
{_id:"o0", orderNumber: 1, products: [{_id: "p0", "price": 200}, {_id: "p1", "price": 500}]}

products (after the update)
{_id: "p0", name: "apple", "price": 300}
{_id: "p1", name: "orange", "price": 600}

In this solution, if name becomes update sensitive in the future, since you do not hardcoded it before, you can never know the value at the time of the order is created. That’s why I tried to find my 2nd solution.

2- For each collection foo that may have documents with update sensitive properties, we create another collection fooRef which will only have 2 properties _id and currentVersion. _id as you may guess, and currentVersion is also object ID which points to the current version of the document in foo. Also the documents in the collection foo will have a property version.

products
{_id: "p0", version: "0", name: "apple", "price": 200}
{_id: "p1", version: "0", name: "orange", "price": 500}

productsRef
{_id: "pr0", currentVersion: "p0"}
{_id: "pr1", currentVersion: "p1"}

orders
{_id:"o0", orderNumber: 1, products: ["p0", "p1"]}
products (after the update)
{_id: "p0", version: "0", name: "apple", "price": 200}
{_id: "p2", version: "1", name: "apple", "price": 300}
{_id: "p1", version: "0", name: "orange", "price": 500}
{_id: "p3", version: "1", name: "orange", "price": 600}

productsRef (after the update)
{_id: "pr0", currentVersion: "p2"}
{_id: "pr1", currentVersion: "p3"}

Now, you can reach anything anytime. But this time, each time you update a document with a update sensitive property, you will create an additional record. This will increase the data size. Also, there will be records that you will never use and need.

CONCLUSION

What I am asking you is, maybe there is another solution I can’t figure out or maybe there is a best practice for this kind of problems.

Thanks in advance for the help…

This is a question about preference, as i dont think there is the best solution. That said, I like the second option more. The first suggestion unnecesarilly fragments data about products IMO, which can be more error-prone as there will be more ways to screw up if schema is going to change again at some point. (there is another reason i like it, and that’s because it treats the critical data as immutable) But there are a couple things i would still do differently:

  • i don’t know why you need a new “version” field in the products, since you already know what the latest version is (from productsRef collection). But maybe you need it for something else, i just don’t see relevance in the context.
  • do you really need to care about data size? If you do i suspect you already have garbage-collecting routine somewhere, just add the code to remove old versions of products that are no longer referenced.
  • the size concern can be alleviated somewhat if you combine both your ideas and instead of copying all fields in the new versions of documents in products you just save the diff. But this also has negatives: this loses schema consistency in products(you might care about it), and also retrieving data will require to do $mergeObjects, potentially on a whole chain of diff fragments. Also clearing obsolete data will be more difficult task here.

Thanks for the reply.

1- I use version property because the order of updates are important for me. It is also possible to discover this by _id property. However, this is more costly with _id.

2- The data size of course matters. In garbage collection, it is always possible to remove a record which seems unnecessary at the time of removal but might be needed in a future release. So, you can never trust garbage collection.

3- It is a different perspective and a way of handling the data. But, I believe this would be more costly in compared to data size.