Hi, I saw in the mongodb documentation on the ‘Causal Consistency and Read and Write Concerns’ page that you talk a lot about ‘causal consistency’. I didn’t quite understand this concept as it is not explained in detail on the page. What would be the idea of this ‘causal consistency’?
What makes it different from ‘eventual consistency?’ Could you give me some examples to better understand this concept?
The link to the page I’m referring to in the documentation is this:
Let’s try to explain the 2 concepts with a few lines:
Eventual Consistency means that the data you are reading might not be consistent right now but it will be eventually. You get this if you read from secondaries using any of the readPreference that can read from a secondary. This means you chose to race with the replication and you don’t have the guarantee to read your own writes as you chose to write on the Primary and read from a Secondary without a guarantee.
Causal Consistency basically prevents that from happening. If within a causal consistent session you write something, then read it 2 lines later, you now have a guarantee that you will read this write operation no matter what, even if you are racing against the replication. Of course it’s a trade off, this means you will have to hang a little to get what you want.
Few years ago when 3.6 was released with this new feature, I published this demo (which is a bit old but…)
The idea was to demonstrate the concept. To guarantee that my secondary was slower than my script, I sent an internal command to pause the replication process. This is just to get a “consistent” result every time I run this script and not only when I get lucky with the race.
Eventually consistent reply as @max replied before I posted my draft
Causal consistency refers to guarantees around the order of operations observed by clients in a distributed system. Client sessions only guarantee full causal consistency with “majority” read concern and “majority” write concern, but different combinations are possible depending on your use case.
The page you referenced outlines causal guarantees (for example “Read Your Own Writes”) with different combinations of read and write concern, including example scenarios.
Traditional databases, because they service reads and writes from a single node, naturally provide sequential ordering guarantees for read and write operations known as “causal consistency”. A distributed system can provide these guarantees, but in order to do so, it must coordinate and order related events across all of its nodes, and limit how fast certain operations can complete. While causal consistency is easiest to understand when all data ordering guarantees are preserved – mimicking a vertically scaled database, even when the system encounters failures like node crashes or network partitions – there exist many legitimate consistency and durability tradeoffs that all systems need to make.
FYI causal consistency and associated guarantees are general data concepts for distributed systems (not specific to MongoDB):
Causal consistency provides guarantees around the ordering of data operations observed by clients in a distributed system, which mimics a single vertically scaled database deployment.
Eventual consistency refers to the behaviour that writes in a distributed system will converge with a consistent history (for example, via application of an idempotent replication oplog) , but are not guaranteed to be consistent if you read from different members of a cluster without appropriate read concerns.
A bit more than juste read your own writes actually:
The paragraph below also covers at least a part of your question.
And this doc answers completely your question I think with the table of guarantees:
But to sum up, it’s a trade off. Test first with w=majority and readConcern=majority. If the performances are “good enough”, then you don’t have to make a trade off. You can then start to trade some of the consistency for speed, but my advice would be to do it step by step and maybe prefer an upgrade to SSD or a better CPU or network before doing a trade off. It’s very use case dependent as well. For some use cases, the trade off isn’t possible so the hardware path is the only solution.