Hi, @Filippo_Del_Tedesco,
Welcome to the MongoDB Community Forums. I understand that you have a question on how you should handle MongoWriteConcernException
. I wish that I had a simple answer or code snippet to share, but unfortunately it really depends on the durability guarantees that your application requires. Let me explain the mental model and hopefully that will assist in your decisions.
Writes to MongoDB clusters are routed to the primary where the write is performed on the collection as well as being recorded in the oplog. (I’ll skip over journalling, checkpoints, and related details. These mechanisms ensure that acknowledged writes to a single node are committed to disk even in the event of a failure.) The oplog entries are replicated to the secondaries where they are also applied. The oplog is a serialized stream of write operations that are ordered in time. By applying a write concern such as w: 2
, you are requesting that the primary only acknowledge the write (with a response of ok:1
) after at least two cluster nodes have applied the write (e.g. the primary and at least one secondary). If the secondaries are slow to replicate the oplog entries, then the write concern can timeout resulting in a MongoWriteConcernException
. The write still happened on the primary, but none of the secondaries were able to replicate the write within the time limit. If the write had failed on the primary, you would have received a different error message such as MongoDuplicateKeyException
.
The oplog entry is still on the primary. The secondaries might have even replicated said oplog entry but haven’t applied it yet. Assuming that no cluster members crash, that oplog entry will eventually get applied.
But what happens if the primary crashes? In that case the remaining members will elect a new primary. Which member will be elected is influenced by which secondary has the most recent data. (e.g. Which oplog entries has it applied.)
This is where things become complicated. Let’s say you have performed writes w1
and w2
to the primary (node-a) and both return a MongoWriteConcernException
. The NIC on node-a then fails. node-b replicated w1
(but after MongoWriteConcernException
happened), but not w2
. node-c has neither write. node-b is elected the new primary (because it has applied more of the oplog). node-a’s NIC is replaced and comes back online. It rejoins as a secondary, realizes that w2
was never replicated before it crashed and undoes that change before re-joining the cluster as a secondary.
In the absence of node or network failures, writes that result in MongoWriteConcernException
will eventually be replicated to other nodes - just not within the time limit of the write acknowledgement. But in the face of node or network failures, you can’t be certain.
How do you handle this in your application? If you must guarantee that writes succeed even in the face of failures, write your writes in a transaction and don’t special case MongoWriteConcernException
. If a write returns a failure, roll it back and try again.
If your application can tolerate the occasional lost write when the primary fails, then you can rely on the eventual consistency guarantees provided by oplog replication. The result will be better performance at the cost of slightly lower durability guarantees.
A middle ground could be special-casing verification logic in your MongoWriteConcernException
handler. You would have to think through what compensation logic you would implement if the write existed on the primary but had not yet replicated to the secondary. Or how to handle the situation where the write doesn’t exist on the new primary (because that oplog didn’t get replicated before the original primary lost connectivity) but your app node crashes while performing the compensating write. Did the compensating write succeed before the app node crash? Did it fail? How can you tell when the app recovers?
Hopefully this explanation provides you the mental model that you need to make the correct design choices for your application.
Sincerely,
James