I am trying to implement the following functionality in an app but I struggle in getting all the pieces together:
I want to implement a migration-like process consisting in executing a set of transactions.
Dependencies between these transactions is modeled by a directed acyclic graph (DAG): each transaction should read the writes of preceding transactions in the DAG. Some transactions consist in scanning an entire collection, sometimes updating documents inside that collection, but the scan should behave as if those updates are not happening. Some non-scan reads may happen during this transaction and those should read the updates. Note that all reads including scan reads must read updates of preceding transactions. Some transactions may happen in parallel because they do not act on the same parts of the database. Hence, if I want the entire process to execute swiftly, I cannot just linearize the DAG and use one big transaction. Here a set of questions I could not quite find the answer to:
- From “Tunable Consistency in MongoDB - YouTube” and the
Mongo.startSession
docs I get thatcausalConsistency
istrue
by default for sessions. Is this correct? - From “Path to Transactions - Local Snapshot Reads” I get that
{readConcern: {level: 'snapshot'}}
should be used for thefind
operation of the collection scan. - From https://docs.mongodb.com/v4.4/core/read-isolation-consistency-recency/#std-label-causal-consistency it seems only
majority
reads guarantee causal consistency. Is asnapshot
read also amajority
read? Does that meansnapshot
reads succeeding writes see those writes? In that case, the only difference between multi-documentsnapshot
andmajority
reads is thatmajority
reads can see writes occurring after the cursor initialization? Since the defaultwriteConcern
is{w: 1}
and the defaultreadConcern
islocal
, does that mean we have to specifymajority
as defaultwriteConcern
andreadConcern
for each session? - The second code sample at https://docs.mongodb.com/v4.4/core/read-isolation-consistency-recency/#examples shows how to guarantee causal consistency across sessions. Is this pattern necessary and sufficient in my use case (each transaction advances its session time to at least the completion time of each of its preceding transaction’s session)?
- In addition to the initial
snapshot
collection scan, can all other multi-document reads use asnapshot
readConcern
while maintaining causal consistency? If that’s the case, I assume the only penalty is potential increased memory usage and execution time?
Could you help me sorting out these? I find the current documentation about this subject to be both terse and scattered.