What is causal consistency in mongodb? Continuation

Thanks for the answers @MaBeuLux88 and @Stennie.

I do not know why. but my previous post from the link below was banned and marked as some kind of propaganda. So, I will continue with this post.

As I understand it, basically causal consistency reinforces the idea that mongodb offers strong cluster-wide consistency as a client will be able to read its own write. That’s it?

Another question is, can the client only do this if it uses read concern combined with most write concern?

The documentation page shows this.
If this combination of read concern major and write concern majority doesn’t happen, then does that mean mongodb doesn’t guarantee strong consistency?

Thanks for the clarifications,
Caio

you need to take a breath before moving on :wink:

depending on your write concern level, data can be lost before being distributed to all nodes, that is a fire-and-forget write. otherwise, you will always get an error or a confirmation. yet if you use a majority level write but something really bad happens that crashes those “majority” servers before data goes to “minority” ones, you may still lose data. but that is a danger over any kind of database. it is the worst-case scenario.

otherwise, your data is guaranteed to be saved to all data-bearing nodes and be consistent across all reads.

Thanks for the feedback, @Yilmaz_Durmaz

Considering this worst case scenario you reported and considering mongodb at its strongest level of consistency, we can say then that even at its strongest level of consistency mongodb will still allow inconsistencies in the database, correct?

That is, even at its maximum level of consistency, can mongodb still have inconsistencies in a distributed environment?

Greetings,
Caio

nope, if writes are lost by a worst-case scenario, you will still have a consistency of remaining already-written data as they were already been distributed to all other nodes. losing data is not equal to inconsistency.

you will not find two nodes having different/inconsistent data unless one is new and data is still being distributed. and in that case, if you do not intentionally use read preference on the older one, you will be served from the node having the latest data, or you will be served “read-only” data if the election process does not find a suitable node to allow writes. this is also to prevent inconsistent writes.

1 Like

Hi @morcelicaio

I think @Yilmaz_Durmaz have provided a great explanation! So, I’d like to share a little of my take on this subject :slight_smile:

As I understand it, basically causal consistency reinforces the idea that mongodb offers strong cluster-wide consistency as a client will be able to read its own write. That’s it?

It’s a bit more than that. Causal consistency provides: Read own writes, Monotonic reads, Monotonic writes, and Writes follow reads. According to Causal Consistency Guarantees:

  • Read your writes: Read operations reflect the results of write operations that precede them.
  • Monotonic reads: Read operations do not return results that correspond to an earlier state of the data than a preceding read operation.
  • Monotonic writes: Write operations that must precede other writes are executed before those other writes.
  • Writes follow reads: Write operations that must occur after read operations are executed after those read operations.

Another question is, can the client only do this if it uses read concern combined with most write concern?

Yes, but also within a causally consistent client sessions. Check out the examples in the page on how to do this (note that you can select the language of choice for the examples there).

If this combination of read concern major and write concern majority doesn’t happen, then does that mean mongodb doesn’t guarantee strong consistency?

You can tune your consistency needs using read/write concerns as mentioned in Causal Consistency and Read and Write Concerns. Note that the stronger the guarantee, typically the more time it will take since MongoDB would need to ensure that all parts of the cluster are in sync with one another. This is the tradeoff, essentially.

Using majority write + majority read is not enough to guarantee causality and reading your own writes, since it also depends on your read preference as well. In Read Your Own Writes: Prior to MongoDB 3.6, in order to read your own writes you must issue your write operation with { w: “majority” } write concern, and then issue your read operation with primary read preference, and either “majority” or “linearizable” read concern.

That is, even at its maximum level of consistency, can mongodb still have inconsistencies in a distributed environment?

I’m not sure I fully understand this question. Could you give an example of the inconsistency scenario you have in mind?

Best regards
Kevin

1 Like

Thanks for the explanations, @kevinadi .

I’m still new to the study of distributed databases, so I may not be able to express myself clearly sometimes.

Thank you for your patience and explanations.

In my case I will use a benchmark to check if mongodb guarantees acid properties when working in a distributed environment, because in the documentation mongodb says that it guarantees strong consistency of its data.
I will use the YCSB+T benchmark to perform my tests.
https://sci-hub.se/10.1109/ICDEW.2014.6818330

What combinations of read concern, write concern, journal and read preference could I test with? There are many possibilities and I still have some doubts.

I’m still new to the study of distributed databases, so I may not be able to express myself clearly sometimes.

In that case, welcome and good to have you here @morcelicaio!

What combinations of read concern, write concern, journal and read preference could I test with?

Short answer is probably: depends on what you want to test :slight_smile: I’m not an expert in testing, but I’m guessing it’s probably goes back to what you’re trying to see. MongoDB provides many, many different knobs you can change to basically customize to tailor the database’s performance vs. consistency model according to your exact needs. However there are some docs that may be useful as a starting point for your journey:

  • Read Isolation, Consistency, and Recency as a high-level overview regarding read & write concerns
  • Transactions if you need multi-document transactions or maybe something that resembles a typical database transactional work
  • Whether you want to test a standalone node (typically used only for development work and may not provide a useful data if you’re interested in actual prod testing), a replica set (I’d say the most common scenario), or a sharded cluster (the most complex and scalable)

You also might want to check out:

  • Production notes describing the recommended setup for the hardware side of things so that the tests are not artificially affected by suboptimal hardware settings
  • Operations checklist

Best of luck with your project!

Best regards
Kevin

1 Like

Thanks for the pointers, @kevinadi .

I will continue my studies and come back here if I have any further questions.

Greetings,
Caio

Ha I’m discovering this thread now. :slight_smile:
My latest answer for reference:

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.