Transactions Background Part 6: Retryable Writes

Recent features in MongoDB such as multi-document ACID transactions have required foundational enhancements in MongoDB to enable them. One of these foundational changes is the ability to perform retryable writes.

MongoDB Transactions Logo

Also in the Transactions Background series:

Summary

By allowing a driver to retry the exact same write safely, without the danger of the server re-applying the write operation, we make it possible for clients to operate more reliably, while reducing the amount of code needed to implement error handling.

Background

When a driver sends a write to a server and an error is raised, it is not always possible to tell if that error came as the write went to the server or as the acknowledgment was being returned. That means it’s also not possible to tell if the server acted on the write command or not.

Consider, for example, a write which will increment a field. The client’s driver could send that request to the server, the server act upon it, but before returning results to the client the connection is lost. On the server, the field has been incremented but the client’s driver has no indication of this. If it sends the write again assuming the connection had failed before it reached the server, then the field would be incremented again. That is an assumption that is bound to fail to be true though. The client itself could try and work out, from the database state, if the write had been applied but that too is fraught with numerous ways to fail.

Retryable Writes: On The Client

Enter the retryable write; this is a write which can be sent multiple times and yet only be applied exactly once. This is implemented at the driver level, by using a connection setting of retryWrites, and requires the write operation to be at least acknowledged (writeConcern > 0). There are a specific set of operations which can use retryable writes. Since MongoDB 4.0, the transaction commit and abort operations have also been retryable, but they retry automatically rather than being specifically enabled in the driver. In all cases, there will only be one attempt to retry the write. Drivers will not keep retrying and therefore the retryable write is specifically good for transient network errors or primary elections, rather than persistent connection issues. A write will be sent to the server with a logical session id (2), and a transaction id.

Retryable Writes: On The Primary

When the write arrives at the server, it is now checked against the transaction table. This is a list of logical sessions ids, last seen transaction ids and pointers into the oplog where that operation was recorded. If there’s a match to the logical session id and transaction id, then the logic of retryable writes says this is a retry. A result is compiled that the previous successful write operation would have generated and its this result that is sent back to the client as an acknowledgement.

If there isn’t a match in the transaction table, then the write operation is presumed new. It is applied and an entry appears in the oplog. The transaction table is them updated against that session id with the transaction id and a pointer to that new oplog entry. And then the acknowledgement is returned to the client.

Retryable Writes: On The Secondary

Secondary nodes do not handle writes until there’s a failover and one of them then takes over as primary. If that happens without changes, the secondary has an empty transaction table and at that point it can’t tell if an operation has previously happened. The solution to this is to replicate the transaction table. Unlike other replication operations, this is integrated into each write recorded in the oplog. Each gets the logical session id and transaction id sufficient that the secondary can build its own transaction table, correct right up to the particular oplog entry.

Then, when there is a primary failover and the secondary steps in, there is a complete transaction table too as well as a replica of the data, ready to go.

Retryable Writes and Transactions

Retryable writes are a foundation for the same retry mechanism which transactions use. As noted, the commit and abort operations are both retryable, but the retry is for the entire transaction as a single operation, unlike the retryable writes previously mentioned. This provides MongoDB with a way of eliminating transient network faults or primary elections from preventing transactions being committed committing of transactions.

This is the last part of the Transactions Background series - for now.

The beta test for the next evolution of transactions on MongoDB is open - Sign up for the Distributed Transactions Beta