Recommended handling of MongoWriteConcernException

Hi,

For testing purposes (and historical reasons) we are running a cluster with MongoDB 5.0.31 in a primary-secondary-secondary configuration. We access the cluster programmatically, the code is written in C# and it is built using MongoDB Driver 2.24.0.

During a window of heavy load, we experienced occurrences of MongoWriteConcernException (Error code 64) while writing data to the cluster via BulkWriteAsync.

From what we understood, the exception itself is not necessarily reporting a permanent failure: give it enough time, and the concern might be reached (as it actually happened in our experiment). At the same time, it seems there can be circumstances when the cluster might not reach the concern (e.g. heavy network impairment), yielding spurious data being left around.

So the question is: what would be the recommended way to implement a robust (=tolerant wrt MongoWriteConcernExceptions) method for data ingestion? Should we somehow implement a verification mechanism that ensures the data we inserted can be read later on with “majority” concern? If this is necessary, what is the recommended way to implement such verification?

For reference, at this point we decided to wrap the BulkWriteAsync call in a transaction, which lives inside a try-catch block: in case the BulkWriteAsync call, or any command of the transaction, trigger an exception, the transaction is aborted and the BulkWriteAsync is retried later. However, if the exception is a MongoWriteConcernException, we only log a warning, and we hope the data propagation will eventually succeed.

What’s your take on the proposed strategy? Surely this cannot be the real solution, since it doesn’t address the case when a MongoWriteConcernException is followed by a true failure in reaching the replica concern. But are we getting closer to something workable? Or can you see any other problematic scenarios we are introducing with this approach?

Thanks!
Filippo

Interesting question! I don’t know the answer, @Filippo_Del_Tedesco , but I’m watching this topic to see if anyone does know the answer!

@Rishabh_Bisht do you know the answer?

Hi, @Filippo_Del_Tedesco,

Welcome to the MongoDB Community Forums. I understand that you have a question on how you should handle MongoWriteConcernException. I wish that I had a simple answer or code snippet to share, but unfortunately it really depends on the durability guarantees that your application requires. Let me explain the mental model and hopefully that will assist in your decisions.

Writes to MongoDB clusters are routed to the primary where the write is performed on the collection as well as being recorded in the oplog. (I’ll skip over journalling, checkpoints, and related details. These mechanisms ensure that acknowledged writes to a single node are committed to disk even in the event of a failure.) The oplog entries are replicated to the secondaries where they are also applied. The oplog is a serialized stream of write operations that are ordered in time. By applying a write concern such as w: 2, you are requesting that the primary only acknowledge the write (with a response of ok:1) after at least two cluster nodes have applied the write (e.g. the primary and at least one secondary). If the secondaries are slow to replicate the oplog entries, then the write concern can timeout resulting in a MongoWriteConcernException. The write still happened on the primary, but none of the secondaries were able to replicate the write within the time limit. If the write had failed on the primary, you would have received a different error message such as MongoDuplicateKeyException.

The oplog entry is still on the primary. The secondaries might have even replicated said oplog entry but haven’t applied it yet. Assuming that no cluster members crash, that oplog entry will eventually get applied.

But what happens if the primary crashes? In that case the remaining members will elect a new primary. Which member will be elected is influenced by which secondary has the most recent data. (e.g. Which oplog entries has it applied.)

This is where things become complicated. Let’s say you have performed writes w1 and w2 to the primary (node-a) and both return a MongoWriteConcernException. The NIC on node-a then fails. node-b replicated w1 (but after MongoWriteConcernException happened), but not w2. node-c has neither write. node-b is elected the new primary (because it has applied more of the oplog). node-a’s NIC is replaced and comes back online. It rejoins as a secondary, realizes that w2 was never replicated before it crashed and undoes that change before re-joining the cluster as a secondary.

In the absence of node or network failures, writes that result in MongoWriteConcernException will eventually be replicated to other nodes - just not within the time limit of the write acknowledgement. But in the face of node or network failures, you can’t be certain.

How do you handle this in your application? If you must guarantee that writes succeed even in the face of failures, write your writes in a transaction and don’t special case MongoWriteConcernException. If a write returns a failure, roll it back and try again.

If your application can tolerate the occasional lost write when the primary fails, then you can rely on the eventual consistency guarantees provided by oplog replication. The result will be better performance at the cost of slightly lower durability guarantees.

A middle ground could be special-casing verification logic in your MongoWriteConcernException handler. You would have to think through what compensation logic you would implement if the write existed on the primary but had not yet replicated to the secondary. Or how to handle the situation where the write doesn’t exist on the new primary (because that oplog didn’t get replicated before the original primary lost connectivity) but your app node crashes while performing the compensating write. Did the compensating write succeed before the app node crash? Did it fail? How can you tell when the app recovers?

Hopefully this explanation provides you the mental model that you need to make the correct design choices for your application.

Sincerely,
James

3 Likes