Precisely synchronized clocks, advanced networking, and other technologies are now widely available in the cloud. We research the use of these modern technologies to improve the consistency and performance of distributed protocols.
Formal methods and automated reasoning techniques allow for mathematical guarantees of correctness and durability for distributed protocols. Formal modeling also enables model-based verification approaches for verifying complex implementations against high-level abstract specifications.
Minor faults in complex systems can trigger cascading failures, leading to prolonged outages and degraded performance. We study fault-tolerance challenges, including metastable failures, to enhance system resilience and reliability.
Murat's research spans distributed systems broadly defined, with a focus on distributed coordination, lightweight formal methods, and fault-tolerance.
Jesse researches distributed algorithms, predictive auto-scaling, performance modeling, and formal specification.
Will’s research broadly focuses on applying formal methods to the design, verification, and interpretability of distributed systems and protocols.
Murat Demirbas, Principal Research Scientist in the Distributed Systems Research Group at MongoDB, has worked on distributed systems broadly defined, including cloud computing, distributed databases, distributed consensus, wireless sensor networks, fault-tolerance, formal methods, and self-stabilization. He developed several influential protocols and systems, including hybrid logical clocks, WPaxos, PigPaxos, and PQR. He maintains a popular blog on distributed systems at http://muratbuffalo.blogspot.com, which has garnered over 5 million page views.
Before joining MongoDB Research, he was Principal Applied Scientist at AWS (2020-23) and Professor of Computer Science & Engineering at the University at Buffalo, SUNY (2005-20), where he graduated 11 PhD students. Murat received a National Science Foundation CAREER award in 2008 and the School of Engineering and Applied Sciences Senior Researcher of the Year Award in 2016.
Jesse received his B.A. in computer science from Oberlin College in 2001. He joined MongoDB in 2011 and has contributed to the Python, C, and C++ client libraries and wrote some of the original specifications to which all drivers conform. He created Motor, the async Python driver for MongoDB. After serving for several years on the replication and serverless engineering teams, Jesse joined MongoDB Research in 2022. He researches efficient and strongly consistent distributed algorithms, machine learning for predictive auto-scaling of cloud resources, performance modeling, and formal specification languages.
William Schultz focuses on applying formal methods to the design, verification, and interpretability of distributed systems. He has developed novel techniques for automatic inductive safety verification of distributed protocols, along with compositional techniques for improving the interpretability of the formal artifacts used within these proofs. He has also worked on a novel dynamic reconfiguration algorithm for Raft-based consensus systems, accompanied by the first formally verified safety proofs for this class of distributed protocols.
Before joining MongoDB Research, he received his Ph.D. in computer science from Northeastern University and a B.A. in mathematics and computer science from Cornell University. He has previously spent time in the industry working on both distributed systems and formal methods at Microsoft Research, AWS, and Apple.
In this paper, we explore model-based testing as a tool for ensuring specification-implementation conformance in complex components of MongoDB. We attempted two case studies: model-based trace-checking (MBTC) in the MongoDB server’s replication protocol and model-based test-case generation (MBTCG) in MongoDB Realm Sync’s operational transformation algorithm. We analyze why one technique succeeded and the other failed and advise future implementers making similar attempts at model-based testing.
In this work, we present a formal, machine-checked TLA+ safety proof of the logless dynamic reconfiguration protocol used in the MongoDB replication system. This proof is formalized in the TLA+ proof system (TLAPS) and constitutes the first machine-checked safety proof of a Raft-based reconfiguration protocol.
In this paper, we present a novel dynamic reconfiguration protocol for the MongoDB replication system. The protocol utilizes a logless approach to managing configuration state and decouples the processing of configuration changes from the main database operation log. We provide a safety proof of the protocol, along with a formal specification in TLA+.
Under submission
In this paper, we describe our experience using modular formal specification techniques in the design and verification of MongoDB’s distributed transactions protocol. We formally specify the protocol and its interface with the underlying storage layer, allowing us to verify high level protocol properties while also formalizing the contract between these two components. This modular approach also enables an automated, model-based verification technique for testing conformance of the WiredTiger storage implementation to this interface. We also formally analyze permissiveness–how well a transaction protocol maximizes concurrency within a given isolation level–a property not previously examined.