Since debuting in 2013, Jepsen has established itself as one of the most rigorous distributed systems testing suites available in the industry today. It evaluates data correctness and safety in the face of extreme failure scenarios – including simultaneous network partitions, drifting systems clocks, and repeated node crashes. It is regarded by many in the database community as the gold standard in evaluating the behavior of distributed systems under critical and cascading infrastructure outages.
MongoDB has worked with Jepsen since 2015 to publicly evaluate database behavior in the face of multiple system failure scenarios.
MongoDB’s most recent joint testing with Jepsen was against MongoDB 3.6.4, with the Jepsen analysis concluding:
"After weeks of testing both insert-only and update-heavy workloads against sharded clusters, we’ve found that MongoDB’s v1 replication protocol appears to provide linearizable single-document reads, writes, and compare-and-set, through shard rebalances and network partitions."
"Thus far, causal consistency has generally been limited to research projects...MongoDB is one of the first commercial databases we know of which provides an implementation."
In May 2020, Jepsen tested MongoDB 4.2.6, introducing a new transaction analysis tool called Elle into the testing suite. The analysis observed anomalies in transactional behavior in the presence of multiple network failures on a sharded cluster.
The testing uncovered a bug that can lead to a previously committed write being incorrectly retried in the presence of a primary failover and a subsequent transaction commit retry. This bug has been fixed in MongoDB 4.2.8 and later versions. The MongoDB test suite has also been updated to ensure that this specific phenomenon is detected in future releases.
Jepsen criticisms of the default write concerns have also been addressed, with the default write concern now elevated to the majority concern (w:majority) from MongoDB 5.0 (see here for more details). Operations commit only when they have been applied to the primary and have been persisted to the journals of a majority of replicas, providing stronger durability guarantees “out of the box”.