Read concern ("roll back" and "stale data example")

hey , i watch this lecture
read concern lecture
and i haven’t fully understood about this topic .

  1. about the example of the stale data , is there a way of fixing it? surely banks and other high sacurity companies can’t take a risk of stale data .
  2. what does it means when the data is rolled back?
    let’s say im doing a read operation on “local” read concern for movie named “spider man” ,and the primary suddenly falls , as a client , do i recieve the query? do i get an error exception?

thnaks

Hi @arik_62314,

In addition to lectures, I would recommend going through the documentation for read concerns.

You are right, there are many real-life scenarios where we need to make sure there is no stale data, ACID properties to be maintained. If we talk in general, we can use read concern “majority” where are sure that data is written to multiple nodes already. The documents returned by the read operation are durable, even in the event of failure.

The documentation readconcern local describes perfectly the timeline and when the writes are applied to all secondary nodes and acknowledgment back to primary node.

image

Summary

Details:

Now, in your example, you will not get exception when performing read operation, either you will receive the document back or no.

  1. If the write for “spiderman” movie is happening at t0, and you are in between t0 and t1, that means the write has not updated to Secondary1/Secondary2. If the primary goes down at this stage, then new primary will be elected and there will be no document of “spiderman” movie will be found.
  2. When primary comes back online and rejoins the dataset, to maintain consistency, the write operations which happened between t0 and t1 containing “spiderman” movie will be reverted (rollback).

This is data loss. To prevent this, either we can use transactions or better write concern to acknowledge the client if write operation has written to majority nodes or not.

Further, we can use transactions to support the applications better.

I hope this helps. Let me know if you still have doubts.

Kanika

You are right, there are many real-life scenarios where we need to make sure there is no stale data, ACID properties to be maintained. If we talk in general, we can use read concern “majority” where are sure that data is written to multiple nodes already. The documents returned by the read operation are durable, even in the event of failure.

In the example of the lecture , how can i confrim i recieved is right data (“age” parameter) in case of failover in “mojority” read concern?, does companies double check data or what exactly they can do?

When primary comes back online and rejoins the dataset, to maintain consistency, the write operations which happened between t0 and t1 containing “spiderman” movie will be reverted (rollback).

by saying rollback you mean that the data will continue to write on the secondarys nodes and send ACK to the client? or the data will be return to the primary and finish this proccess ?

thanks

Lets say primary (where spiderman movie is there is P1), Secondary 1 is S1, and Secondary2 is S2.
After P1 server is down, S1 becomes primary. Now replica set is having S1 and S2 nodes with no document containing “spiderman” movie.

Rollback means removing documents from P1 which are not there in new primary node S1. As S1 is the primary, the oplog will be applied to the secondary nodes to maintain consistency.

Kanika