BLOGAtlas Vector Search voted most loved vector database in 2024 Retool State of AI report — Read more >>

MongoDB and Jepsen

Evaluating MongoDB against some of industry’s toughest data safety, correctness, and consistency tests

Since debuting in 2013, Jepsen has established itself as one of the most rigorous distributed systems testing suites available in the industry today. It evaluates data correctness and safety in the face of extreme failure scenarios – including simultaneous network partitions, drifting systems clocks, and repeated node crashes. It is regarded by many in the database community as the gold standard in evaluating the behavior of distributed systems under critical and cascading infrastructure outages.

MongoDB has worked with Jepsen since 2015 to publicly evaluate database behavior in the face of multiple system failure scenarios.

MongoDB 3.6.4

MongoDB’s most recent joint testing with Jepsen was against MongoDB 3.6.4, with the Jepsen analysis concluding:

"After weeks of testing both insert-only and update-heavy workloads against sharded clusters, we’ve found that MongoDB’s v1 replication protocol appears to provide linearizable single-document reads, writes, and compare-and-set, through shard rebalances and network partitions."

"Thus far, causal consistency has generally been limited to research projects...MongoDB is one of the first commercial databases we know of which provides an implementation."

  • To achieve linearizable reads, writes, and compare-and-set operations, the Jepsen analysis states users must configure read concern linearizable and write concern majority. This is also noted in the MongoDB read concern and write concern documentation. The majority write concern (w:majority) is now the default as of MongoDB 5.0; see here for more details.
  • The outcome of the 3.6.4 tests is the result of significant multi-year engineering investments in MongoDB's distributed systems design including consistency, isolation, and durability controls; the implementation of a cluster-wide global logical clock; and an enhanced RAFT-based replication consensus protocol.
  • We have complemented these investments by integrating Jepsen tests into the MongoDB Evergreen continuous integration suite. This work ensures adherence to these specific Jepsen tests in MongoDB’s ongoing development work and future releases.

MongoDB 4.2.6

In May 2020, Jepsen tested MongoDB 4.2.6, introducing a new transaction analysis tool called Elle into the testing suite. The analysis observed anomalies in transactional behavior in the presence of multiple network failures on a sharded cluster.

The testing uncovered a bug that can lead to a previously committed write being incorrectly retried in the presence of a primary failover and a subsequent transaction commit retry. This bug has been fixed in MongoDB 4.2.8 and later versions. The MongoDB test suite has also been updated to ensure that this specific phenomenon is detected in future releases.

Jepsen criticisms of the default write concerns have also been addressed, with the default write concern now elevated to the majority concern (w:majority) from MongoDB 5.0 (see here for more details). Operations commit only when they have been applied to the primary and have been persisted to the journals of a majority of replicas, providing stronger durability guarantees “out of the box”.

Resources to Learn More