Recent features in MongoDB such as multi-document ACID transactions have required foundational enhancements in MongoDB to enable them. One of these foundational changes is the ability to perform Safe Secondary Reads.
Also in the Transactions Background series:
- Part 1: Low-level timestamps in MongoDB/WiredTiger
- Part 2: Logical Sessions in MongoDB
- Part 3: Local Snapshot Reads
- Part 4: The Global Logical Clock
- Part 5: Safe Secondary Reads
- Part 6: Retryable Writes
Safe Secondary Reads address a problem that can occur when chunks of documents, or documents within a shard key range, are being migrated between shards. It ensures that not only the primaries of the shards involved in the migration are aware that chunks are in transit or have moved, but also the secondaries. This project made MongoDB’s dynamic and automatic balancing, easy to leverage.
Before the introduction of Safe Secondary Reads, only the primaries of the shards held a routing table which detailed which chunks of data were owned by which shards. When a query came into the primary, this routing table would be used to filter out documents that were in the middle of a migration to avoid duplicated results for a query sent to multiple shards.
When chunks of data were in the process of migrating between shards – typically as part of a rebalancing process to evenly distribute data across the cluster – the process would be to copy the chunk from the source shard to destination shard, update the routing table on both to show the chunk’s new home, and then delete the chunk from the source shard.
While this is effective for queries routing to the primary in a shard, queries on the secondaries in the shards do not know the migration is happening and return the results they have. For a period of time, while a migration is in progress, there’s a copy of the data on both shards’ secondaries and so a secondary read could return both copies from either shard.
The solution to the problem, in its simplest form, was to replicate the routing table in the shard primary to the secondary nodes in the shard. As the routing table changes during the migration, it is then replicated to the secondary nodes. This makes the secondary node effectively migration aware and able to correctly route queries addressed directly to them. To leverage this routing information with secondary reads, send ReadConcern “local” or “majority” with your queries.
Safe Secondary Reads and Transactions
In the next part of the series, we'll look at retryable writes in MongoDB and how important they are for MongoDB transactions.
The beta test for the next evolution of transactions on MongoDB is open - Sign up for the Distibuted Transactions Beta