February 6, 2017 | Updated: July 27, 2022
What do you do with a third-party tool that proves your application lacks a feature? Add that tool to your continuous integration system (after adding the feature, of course)! In our case we have added linearizable reads to MongoDB 3.4 and use Jepsen to test it.
What is linearizability?
Linearizability is a property of distributed systems first introduced by Herlihy & Wing in their July 1990 article "Linearizability: a correctness condition for concurrent objects" (ACM Transactions on Programming Languages and Systems Journal). Peter Bailis probably provides the most accessible explanation of linearizability: "writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
In MongoDB 3.4, linearizable reads are now supported on single documents, using a new read concern called "linearizable". Previously, linearizable reads were possible only by using a findAndModify operation on a single document and updating an extraneous field in the document, with a writeConcern of "majority". Keep in mind there is a performance penalty for this. Linearizable reads have a performance profile similar to majority writes, as each linearizable read makes use of a no-op majority write to the replica set to ensure the data being read is durable and not stale.
Linearizable reads address corner cases that can emerge in circumstances that are rare in practice. For example, consider a replica set that holds an application’s user account data, including passwords, and a web application with multiple access points (threads) connected to the database. Suppose a user identifies a credentials breach and is trying to change their password at the same time as a hacker is trying to log in as them. Suppose further that the timing of a network partition is just right (or wrong, depending on your point of view), creating a situation in which the nodes on one side of the partition have elected a new primary, while the nodes on the other side have not yet realized there is a problem and that a new primary has been elected.
In this scenario, the user's application thread is working to change credentials by making requests to a new primary, n2, but the hacker's application thread connects to the old primary, n1, permitting it to obtain the old credentials and log in as the user. The figure below illustrates this. With MongoDB 3.4, using a read concern "linearizable" this would not be possible. Linearizable reads ensure that a node is still the active primary at the time of the read, before returning the results of the operation, by confirming with the other nodes in the replica set. Since a majority of nodes will not confirm the primary status of n1, the hacker would be unable to complete the read of the user's credentials necessary in order to log in.
Testing linearizability in MongoDB 3.4
Jepsen is a tool developed by Kyle Kingsbury, aka Aphyr, for testing how databases perform in the face of network partitions. Kyle uses Jepsen to test distributed systems, including MongoDB, and publishes the outcomes on his blog. Since he started his “Call Me Maybe!” series in 2013, Kyle has tested over 20 distributed systems with Jepsen, and his rigorous and detailed writeups are fantastic. In May 2013, he posted an analysis of MongoDB 2.4.3 with respect to linearizability. (Spoilers: at that time, MongoDB did not support linearizable reads.)
As we introduced support for linearizable reads in MongoDB 3.4, we wanted to ensure robust testing for this feature. Jepsen, the gold standard in testing linearizability, was obviously the right tool for the job. We started by contracting with Kyle to test MongoDB 3.4 extensively prior to release.
Kyle’s write-up of his results is now available, but the real goal of our collaboration with Kyle was to integrate Jepsen with our automated testing framework. At MongoDB, we run all tests in Evergreen, our continuous integration (CI) system. Running automated tests in Evergreen enables us to verify improvements and prevent regressions. Running the Jepsen tests in Evergreen gives us confidence that linearizable reads are operational and that no regressions have been introduced as we continue to develop MongoDB. In order to integrate Jepsen with Evergreen, we needed to modify Jepsen to support the Evergreen environment. In the rest of this article, I'll provide an overview of Jepsen and Evergreen and walk through the process of integrating the two systems.
Jepsen tests distributed databases for consistency by using several clients to independently perform compare and set (CaS) operations. CaS operations involve reading and writing integers to the database. A CaS operation performs an update to a record contingent on its current value, and the resulting value is expected to be available to all clients reading that record. Each write, acknowledged by the database, is recorded in a separate log. After the clients complete, Jepsen checks the databases against the recorded results in its log using the Knossos analysis tool. Knossos uses an array of techniques to analyze a history of a writes for linearizability. If either acknowledged writes are missing or unacknowledged writes are present, then the database is considered inconsistent with respect to the clients. This inconsistency would prove that the database did not handle the network partition properly.
Jepsen tests are broken up into a series of phases: cluster setup, workload generation, chaotic network partitioning, and application analysis. I'll provide an overview of each phase as background for the challenges we faced in integrating Jepsen with our testing framework.
The cluster setup starts up a MongoDB replica set. As seen in the following diagram, the database runs on 5 nodes, named n1 through n5, with n1 started as the primary node. Note that the Jepsen control node resides on a separate node, which can always communicate with all the nodes in the cluster.
The cluster is implemented using Linux containers, such that each container hosts one mongod process of the replica set.
Jepsen creates read/write workloads on MongoDB clusters using client threads within the Jepsen application. These threads use the MongoDB Java driver to communicate with each mongod process running in the MongoDB replica set. The clients perform read and write operations as usual on the mongod primary. For acknowledged writes, each client records the results of these operations in a Jepsen log. The workload can be configured to run for a desired duration.
Chaotic network partitioning
While the workload is active, a separate Jepsen thread performs random network partitioning and restoration between the nodes in the MongoDB cluster. Jepsen implements partitioning by changing firewall rules on each node in the cluster so that two or more nodes are blocked from communicating.
Jepsen's network partitioning does not affect client threads. Clients will still have access to all nodes in the cluster. As expected, this could lead to changes in the configuration of the replica set, such that a new primary node is elected. After a period of random partitioning, the network will be fully restored before the workload completes.
After the workload has completed, the analysis phase begins. This phase will check for consistency between the client logs (acknowledged writes) and the database. A test failure indicates that the history of a sequence of operations is not found. As mentioned earlier, the failure could be due to either acknowledged writes missing from the database or unacknowledged writes being present.
As I mentioned above, Evergreen is MongoDB's home-grown continuous integration system. It is written in Go. Evergreen supports patch and integration builds. It consists of pre-provisioned systems and variants, with tests designed to run across those variants. We consider a variant as the platform (operating system and hardware) and the application type, MongoDB Community or MongoDB Enterprise edition, plus other build options like SSL. For example, Red Hat Enterprise 6 Community and Red Hat Enterprise 6 MongoDB Enterprise are two different variants. The Evergreen dashboard provides easy access to the recent history of all test suites and tests, executed on each variant and commit.
Evergreen is flexible and can support any type of testing that we can automate. We can either install a test on an existing variant or create a new variant with the configuration required for a new test framework we want to integrate. For a test framework to work well with Evergreen, it must provide pass/fail results and test artifacts.
There are many ways in which engineers interact with Evergreen. Developers use Evergreen to test out their code before it is committed, in addition to adding new unit and integration tests of their own. Test Engineers add new tests and test suites, which become part of the corpus of tests that run. Performance Engineers add new benchmarks and tests and analyze any deviations from previous builds to look for performance regressions. Build Engineers use Evergreen to build, package and deploy MongoDB releases. Managers monitor test results to determine the health of the code base.
Say for example, Sara wants to make sure that modifications she made for replica set elections will not cause any problems. She submits her patch set to Evergreen and chooses the tests and variants to verify that the code changes do not cause any integration failures. After Sara has submitted her code to the GitHub repository, a more comprehensive set of automated tests will also run automatically. In this manner any failures that occur from Sara's commit can easily be identified, and her commits can be reverted if necessary.
The following diagram shows a real-world patch build I submitted. The tests that have been completed are in green, the running ones are in yellow. Failed ones would be displayed in red. I chose to run a subset of tasks and variants to test this patch. A complete history of patch builds is maintained by Evergreen and can be viewed at any time.
Integrating Jepsen with Evergreen
Integrating Jepsen with Evergreen required two primary bodies of work. First, we needed to create a variant in which Jepsen would run. Remember that variants are defined by the platform (operating system and hardware) and application type. Second, we needed Jepsen to support the interface required for tests within the Evergreen framework. I'll discuss each of these tasks in turn.
Jepsen is written in Clojure, a Lisp dialect, which is compiled into a Java executable. Jepsen currently only runs on Debian-based platforms and requires Linux containers (LXC), a virtualization mechanism for running multiple isolated systems using a single Linux kernel. Therefore, we needed to create a new, distinct Evergreen variant, to support the configuration required by Jepsen. This variant requires Linux containers provisioned in the network configuration Jepsen expects.
In Evergreen, part of automating tests is automating the variants on which they run. Therefore, our first task was to automate provisioning of the variant required by Jepsen. The base image for this is Ubuntu 14.04.
The Jepsen MongoDB tests are designed to use a five-node replica set. It uses Linux containers for the nodes. LXC enables Jepsen to run virtualized nodes which can be accessed from the control host and can be partitioned among themselves to test MongoDB in the face of various network problems.
The final requirement for the Jepsen variant was to ensure lein and Java 1.8 are included to compile the Jepsen Clojure code and execute the tests.
Evergreen uses Chef to provision variants for running tests. Once we were able to manually provision a variant with the above requirements, we wrote a Chef recipe to enable automated provisioning.
In order to make Jepsen tests run within Evergreen, we needed Jepsen to support automated test runs. Jepsen was designed to run on an ad-hoc basis rather than as one of many types of tests in a continuous integration system. In order to complete our integration of Jepsen into Evergreen, we contracted with Kyle to modify Jepsen to provide the setup, logging, exit status, and cleanup capabilities the Evergreen framework requires.
As we were integrating these changes in Evergreen, we would stumble upon system failures which made it difficult to run repeatedly. Some of these problems were due the memory usage of the JVM invoked by lein. We tried larger hosts to overcome this problem, but found that limiting the Java heap settings (Xms, starting memory & Xmn, maximum memory) helped stabilize the memory usage and avoid out of memory conditions.
We encountered other problems related to the Jepsen default timeouts for certain operations to be ready. Increasing the timeout for provisioning the LXC nodes with mongod, and having the replica set ready helped in this area.
In our initial test runs with Jepsen, we occasionally encountered problems where an LXC node was not ready for access when we expected it to be. So we added extra restart logic for that case.
With the ability to automatically provision a Jepsen variant and run Jepsen reliably as part of our suite of tests, Jepsen was fully integrated with Evergreen and ready for repeated execution, continuously!
What we gained
We continue to run Jepsen to ensure no regressions are introduced and enhance the testing it provides to increase code coverage for new scenarios. Early in the MongoDB 3.4 development cycle, Jepsen helped identify a bug in our implementation of linearizable reads.
This project has proven incredibly rewarding, and we encourage readers to consider integrating third-party tools used for benchmarking applications in their domains with their testing frameworks. Harnessing open source software and collaborating with its developers are not only beneficial for your application, but for ensuring your user's applications will be successful as well. Integrating Jepsen with Evergreen enabled us to continue testing everything about MongoDB.