Data modeling and MVCC write throughput

I’ve been using a MongoDb-based message queue for a while now and recently built out similar functionality in Postgres 12.

My current mongo model has the message payload and metadata all in one document (with GridFs implemented as necessary). Very easy to work with and 95% of the time all the data in the document is required (when being read back out).

Initially I followed this model with PG and was surprised by how much file bloat there was. It’s due to the MVCC nature of their storage engine and how updates are actually new copies of the updated record. Separating the (never updated) message payload into a separate table significantly reduced the db size and was generally much faster.

WiredTiger also implements some form of MVCC (but is clearly more space efficient), and this got me wondering if I should, in fact, split off the message payload and metadata into 2 separate collections - even though this would be a 1-to-1 relationship. The message payload would never need to be re-written when the (comparably small) metadata is updated.

Anyone have any experience on large documents needing to be fully re-written when just a tiny subset of the document is changed? Thoughts?

After some initial testing, it appears the answer is a resounding no. The cost of making 2 round-trip writes instead of one sinks this idea pretty quick.

If it were possible to do a cross-collection bulk insert then it might be worth investigating further.