Cat-Herd's Crook: How YAML test specs improve driver conformance

A. Jesse Jiryu Davis
March 2, 2016 | Updated: September 19, 2022

At MongoDB we write open source database drivers in ten programming languages. We also help developers in our community replicate our drivers' behavior in even more (and more exotic) languages. Ideally, all drivers behave alike; or, where they differ, the differences are written down and justified. How can we herd all these cats along the same path?

For years we failed. Each false start at standardization left us more discouraged. But we’ve recently gained momentum on standardizing our drivers. Human-readable, machine-testable specs, coded in YAML, prove which code conforms and which does not. These YAML tests are the Cat-Herd's Crook: a tool to guide us all in the same direction.

Dozens of cats

Our drivers for the ten programming languages we support are, for the most part, independent rewrites. We implement the same API, the same algorithms, the same wire protocol from scratch each time. Once each driver is released, we maintain its separate code as MongoDB evolves and adds features.

All these official drivers don't even encompass the whole effort. Programmers in our community tackle driver development in other languages that are newer or more exotic. Sometimes we come across some driver already fully-formed, published by someone we don't even know. Of course, these third-party drivers reuse code as rarely as we do. They're rewrites of the same ideas, over and over again.

I hear you thinking: "How wasteful! All those drivers should just be thin wrappers of the C Driver." But the effort pays off: any programming language you might reasonably use has a language-native, idiomatic driver for MongoDB.

There is, however, a problem: unintentional differences among the drivers.

Let us set aside the topic of bugs. All drivers have them, but this article is not about bugs. It's about driver authors making reasonable choices that differ. Without intending to, we vary. Sometimes we don't even know what the variations are.

"Local threshold"

A couple years ago, Jesse and Samantha specified a simple way for MongoDB drivers to load-balance queries across servers. It was easy enough to build, but, to our exasperation, not all drivers implemented it the same.

Say you want to load-balance your queries between two MongoDB servers in a replica set, but not if the network latency to one of them is much greater than the other's. We consider the scenario in which one of them is a 10 ms round trip from your application, and the other is 20 ms:

Jesse specified how drivers should do this balancing act: they should respect a "local threshold." The closest server is always eligible for queries, and so are all servers not more than 15 ms farther than it. (This logic applies when you configure the driver for secondary reads or "nearest" reads, see the manual for a specific driver like PyMongo for details.)

Some driver authors understood local threshold as Jesse intended. In the example here, the driver should spread load equally between the 10 ms and the 20 ms server, because the distant server is less than 15 ms farther than the near server:

But several driver authors misinterpreted the spec to mean, "query the nearest server, or any server within a 15 ms round-trip from the application". So in this case, the server 20 ms away is completely excused from load-balancing. The driver only uses the nearest server:

This misinterpretation had two consequences:

Consequence Number One: Some drivers didn't offer the trade-off we'd decided upon, between low-latency and load-balancing. Instead, in situations like this example, all you got was low-latency. But maybe that's not so bad. You might argue that's a reasonable way to implement a driver. Which leads us to the real issue:

Consequence Number Two: You install a driver and you don't know which behavior it implements. Unintentional differences like this cost our customer support team, make our drivers hard to document, and make them hard to learn.

The inconsistency wasn't due to a lack of specification. We wrote it down clearly!

"Determine the 'nearest' member as the one with the quickest response to a ping. Choose a member randomly from those at most 15ms 'farther' than the nearest member."

This is why it's so annoying. Everyone read the first sentence, but not everyone carefully read the second. After all, reading English is boring. As a result, some drivers shipped with this hidden variation that lasted for months or, in one case, more than a year before we knew that they weren't up to spec.

Did we have to let it linger?

Why? Why did these hidden, unintentional differences last for so long? There are two causes:

Too hard to verify

Jesse and Samantha were unable or unwilling to read dozens of implementations of "local threshold." We don't know dozens of programming languages, and we didn't want to make a career of ensuring this one idea was coded consistently. And "local threshold" is just one of hundreds of features that MongoDB drivers must all implement the same.

Non-conformism

We also balked at the social discomfort of forcing our colleagues to conform to our idea. Perhaps at a more starchy, proprietary company, this isn't a problem. The boss says, "Do this!" and everyone steps into line. But at MongoDB there's a collegial, open-source vibe. Although there eventually is a final outcome to any debate, enforcing that is uncomfortable. Samantha and Jesse weren't eager to be the enforcers.

The authors of other specs besides "local threshold" weren't eager to be enforcers, either. But the drivers team never gave up. We tried to unify our drivers by a variety of methods. The first three failed.

False starts

Reference implementation

Since prose wasn't rigorous enough, our next idea was to describe the "local threshold" algorithm in code. We used Python, since it is famously legible:

def choose_member(servers):
    best_ping_time = min(
        server.ping_time for server in servers)

    filtered_servers = [
        server for server in servers
        if server.ping_time <= best_ping_time + 15]

    return random.choice(filtered_servers)

This is certainly simple. It can even be unit-tested! But despite this reference implementation, unintentional differences persisted in our drivers. Why? Having a reference implementation doesn't prove that every other language implementation matches it. Besides, it is just as hard to read code as English.

A reference implementation does have the advantage that it is less ambiguous, but that is the only advantage. Varying implementations lingered.

Tests in prose

We had a better idea: We could write tests!

"Given servers with round trip time times 10ms and 20ms, the driver reads from both equally."

But again, we could not prove that everyone implemented the tests, or implemented them the same.

A test plan in prose is a substantial step in the right direction. Its main weakness is maintainability: over time, the specification evolves. We find problems with it, or improve it. We can broadcast those changes by updating tests or adding them, but we can't prove that a driver has updated its tests in Java and Python and Haskell to match the new English tests. There isn't enough time in the world for one person to read dozens of test suites in dozens of programming languages and verify all are up to date.

Cucumber

We needed automated, cross-language tests. After evaluating some tools we tried Cucumber.

Cucumber is a "behavior-driven development" tool. It was designed for coders to show to clients and say, "Do we agree that this is what I'm going to implement?" The client can read the test because it resembles English. It looks something like this:

Feature: Local Threshold
    Scenario: Nearest
        Given a replica set with 2 nodes
        And node 0 has latency 10
        And node 1 has latency 20
        When I track server latency on all nodes
        And I query with localThresholdMS 15
        Then the query occurs on either node

The syntax is funky. And there is still ambiguity here—what does "when I track server latency on all nodes" mean?

When engineers think a tool is aesthetically offensive, no amount of debating or threatening them will lead to an enthusiastic implementation.

But Cucumber does have an advantage: At least the tests are automated! The way you automate it is, you write a pattern-matcher that matches each kind of statement in these tests, like "node 0 has latency 10". Then you implement the statement in terms of driver operations in your programming language. We planned to do that work in C, C++, Python, Javascript, and so on, and then verify all the drivers matched the expected outcome: "the query occurs on either node." If this worked, we would have proven all drivers implemented "local threshold" the same.

Writing these pattern-matchers in a dozen languages sure seemed like hard work. Even worse, some languages like C didn't have a Cucumber framework at all, so we'd have had to write one. The work was daunting—we were certain there must be a better way.

Besides, there was a cultural rebellion against Cucumber. For Ruby or Python programmers, Cucumber looked reasonable, but less so to lower-level coders. Cucumber looks absurd to a C programmer. When engineers think a tool is aesthetically offensive, no amount of debating or threatening them will lead to an enthusiastic implementation.

So we relented. But now we were without a paddle. For more than a year, we kept writing specs, and we made a solid effort to manually verify that everyone implemented them correctly. But we didn't trust that all the drivers were the same, because we couldn't mechanically test them.

But we did find a way out. Our solution was to write tests for our specs in YAML.

Tests in YAML

What's a YAML?

YAML is Yet Another Markup Language—actually, its inventors changed their minds. YAML Ain't Markup Language. It's a data language. The distinction is revealing: unlike HTML, say, which marks up plain text, YAML code is pure data. It borrows syntax from several other programming languages like C, Python, Perl. Therefore, unlike Cucumber, YAML feels like neutral ground for programmers in any language. It doesn't provoke an aesthetic revolt. Also unlike Cucumber, most languages have YAML parsers. For those that do not, we convert our YAML tests into JSON.

Apart from its universality, YAML is appealing for other reasons. Large data structures are more readable in YAML than in JSON. YAML has comments, unlike JSON. And, since YAML is a superset of JSON, we can embed JSONified MongoDB data directly into a YAML test.

YAML is also factorable! For example, here are the descriptions Samantha wrote in YAML for servers with round trip times of 10 and 20 ms. We will use them to write a standard test of "local threshold":

server_definitions:
# A near server.
- &server_one
  address: "a"
  avg_rtt_ms: 10

# A farther server.
- &server_two
  address: "b"
  avg_rtt_ms: 20

YAML's killer feature is aliasing: you can name an object using the ampersand, like "&server_one" above, then reference it repeatedly:

# A two-node replica set.
servers:
- *server_one
- *server_two

Think of &server_one as "address-of" and *server_one as "dereference", like pointers in C. This lets us DRY our test specs.

Testing "local threshold" in YAML

With our two servers declared, we can define a test of "local threshold". Both servers should be eligible for a query:

operation: query

# Expected output: both servers are eligible for a query.
eligible_servers:
- *server_one
- *server_two

There are about 40 YAML tests in this format, that test different aspects of the drivers' load-balancing behavior. We ship a README with the tests, that describes the data structure and how driver tests should interpret it, and how to validate a driver's output against the expected output.

Here's an edited version of the Python driver's "local threshold" test harness. It uses the driver's real Topology class to represent the set of servers and their round trip times, and the driver's real function for selecting a server. But the test doesn't use the network: the inputs it needs are supplied by YAML data.

from pymongo.topology import Topology
from pymongo.server_selectors import (any_server_selector,
                                      apply_local_threshold)

def run_scenario(scenario_def):
    servers = scenario_def["servers"]
    addresses = [server["address"] for server in servers]
    topology = Topology(addresses)

    # Simulate connecting to servers and measuring
    # round trip time.
    for server in servers:
        topology.on_change(server)

    # Create server selector.
    if scenario_def["operation"] == "query":
        selector = query_server_selector
    else:
        # ... other operations we can test ...
        pass

    eligible = topology.select_servers(selector)
    expected_eligible = [server["address"] for server in
                         scenario_def["eligible_servers"]]

    assert expected_eligible == eligible

We run this harness with our 40 YAML files to test that "local threshold" and related features are all up to spec in the Python driver. Besides these 40 tests, there are other suites, in slightly different structures, that test other specifications of other aspects of the drivers.

With this method there's still some work that has to be done for each driver. All the driver authors write "harnesses" to run each spec's YAML tests. This duplicated work is unavoidable: the drivers are separate code bases in different programming languages, and they have to be tested in different ways. But the harnesses are concise little bits of code, and we are significantly deduplicating: we design tests once, and we maintain one set of tests for all drivers. The initial effort of reimplementing the YAML test harness in each driver gives that driver access to the shared suite, and keeps paying off forever after.

The payoff

On occasion, the spec changes. Jesse or Samantha or another driver engineer publishes an update that improves the spec and its tests. Most of the time, new tests broadcast spec changes very precisely: the drivers update their copies of the YAML files, they fail the new tests, and then they bring their implementations back up to spec.

Sometimes a detail can fall between the cracks. For example, we might specify that drivers should track an extra piece of information about the database servers' state, and add tests that show the expected value of this variable, but forget to update a few drivers' test harnesses to actually assert the driver's value matches the expected value. Still, such lapses are far better contained now than before we started using YAML tests.

We're closing the gaps that allow for misinterpretation, and we're making the duplicated effort as small as possible.

Brave new world

Ever since we began using YAML tests, our specs and our drivers have been improving rapidly.

Better implementations

The usual arguments in favor of Test Driven Development apply to our YAML tests: writing a driver (or adapting one) so it can be tested this way leads toward neatly factored interfaces, cleaved along the same conceptual borders as our specifications themselves.

Accountability

If your driver passes the tests, it is up to spec, otherwise you have to fix your code. It's no longer the spec author's responsibility to review all the drivers to catch mistakes.

Additionally, YAML tests have a way of ending debate. In the past, driver authors might ignore a spec, or remain attached to the different choices that remain in their own favorite code. These alternative ideas are usually excellent, but they are not the spec. Jesse, Samantha, and other spec authors hesitated to enforce their decisions, but a YAML test doesn't care. The layer of indirection that Samantha introduced into the verification process, by publishing YAML tests for our feature, reduced the emotional friction caused when a spec made our colleagues change their favorite code.

Encourages more specs

Writing a spec is time-consuming and often frustrating. When Samantha, Jesse, or any of the other driver engineers at MongoDB writes a spec, we have to learn the problem deeply, anticipate our users' future needs, then try to get dozens of eccentric coders to agree on one way to code a solution.

All this hard work deserves a satisfying outcome, but very often the outcome was discouraging. Some specs were never implemented by all drivers, or implemented inconsistently. "Leading programmers is like herding cats" is a cliché because it's true.

But now, with our YAML testing system in place, we know that our hard work will pay off. The specs we write will be implemented correctly, so we're motivated to write more specs and our standardization process is accelerating. Our specs actually work now.

Cats cooperating

Our success with the new spec process lets us dream of a greater ambition: organizing the open source community to build standard MongoDB drivers. The YAML tests are a pathway for outside contributors to write high-quality drivers that are consistent with ones we publish.

We could imagine the day when outside code is proven as trustworthy as our own. We're not there yet. A lot of the knowledge and discussion about how drivers interact with the MongoDB server is still internal, and it's hard for an outside developer to catch up on the debates and access the institutional knowledge of our engineering team. But we can at least see the way forward now; the spec tests are a powerful accelerant in that direction.

← Previous

Converting a Replica Set to a Sharded Cluster

We are excited to announce that Cloud Manager now supports converting a replica set to a sharded cluster via Automation. This oft-requested feature will help you scale your deployment more easily. Here’s how: Head to your Deployment Page and use the “…” menu for an existing replica set to choose “Convert to Sharded Cluster” Enter in the requested details, like which hosts you want your mongoSes and config servers to be deployed on and on what ports. Click “Next” and then you can drag and drop processes as needed to tweak your deployment. Push “Review and Deploy” and then “Confirm and Deploy” Modify your application to reach out to the mongoSes rather than the replica set Cloud Manager Automation has now converted your replica set to a sharded cluster. This does not mean that all of your collections and databases are sharded, though. Cloud Manager Automation does not issue enableSharding or shardCollection commands to activate sharding for particular databases and collections, so issuing these and choosing a suitable shard key are left to you. Otherwise, you’re good to go, including adding additional shards via the wrench menu, if necessary.

March 1, 2016

Next →

MongoDB Announces Leadership Transition

Dev Ittycheria, President and Chief Executive Officer, shared the following message with MongoDB employees this morning. This is the hardest email I have ever had to write to all of you. If you have not seen the announcement, I have decided to retire as CEO. Effective November 10, 2025, Chirantan “CJ” Desai will become the new CEO of MongoDB. This was not an easy decision for me. The process to get to this point has been deeply emotional, as I care profoundly about MongoDB and the people who have made the company what it is today. This news may come as a surprise, and for some, perhaps even a shock. That’s natural. Leadership transitions can evoke a range of reactions. I want to share why this is happening, and why it’s the right thing for MongoDB. Every personnel change, including the most senior leadership changes, involves two key decisions: first, recognizing that it is the right time for change, and second, selecting the best person to replace the person leaving. This email is intended to explain both decisions. Earlier this year, as part of our regular succession planning process, the Board and I discussed my long-term commitment. They asked if I would continue as CEO for another five years. After many conversations with my family and the Board, I realized I could not make that commitment. Some CEOs see their title as their identity. I do not. My core responsibility is to serve in the company's best interests. The company is primed for a new leader. One with a fresh perspective, grounded in experience and skills needed to guide MongoDB through its next evolution as a company, what we call MongoDB 3.0. Consequently, I informed the Board that I would commit to two more years to help find a successor. That began the search process for a suitable successor. To our surprise and delight, what we thought would easily take 12 to 24 months happened much faster than anyone expected. After engaging with multiple qualified candidates, we found the right successor in CJ. CJ is uniquely qualified for this role. CJ brings the rare growth-at-scale experience that will help continue to build MongoDB into an iconic technology company. At ServiceNow, he was the only executive to work directly with three of its highly regarded public company CEOs and played a pivotal role in organically scaling the company from just over $1 billion to more than $10 billion in revenue. Only a handful of independent software companies have ever reached that milestone. CJ helped transform ServiceNow from a product company to a platform company, scaled engineering, drove go-to-market excellence, and engaged deeply with investors. More recently, as President of Product and Engineering at Cloudflare, he helped fuel strong growth and stock performance. CJ also possesses the personal qualities needed to succeed as CEO. He is humble, eager to learn, and wants to draw on the perspectives of the people at MongoDB and other stakeholders to inform his thinking. This blend of experience, judgment, and character gives me full confidence that he is well-equipped to lead MongoDB through its next phase of growth. I often think of MongoDB’s journey as a long and extraordinary expedition. For the past eleven years, I have had the privilege of serving as its guide, helping chart the course, rally the team, and climb together through both calm and challenging terrain. Along the way, we have reached remarkable summits and proven what is possible through relentless innovation, persistence, and teamwork. Now it is time for a new guide to lead the next stage of the ascent and take MongoDB to even greater heights. CJ is the right leader to take MongoDB to the next summit. MongoDB is on a strong footing, with a clear strategy, an exceptional leadership team, a product platform that is more relevant than ever, and a business that is executing well. The rise of AI and the explosion of data-intensive applications play directly to MongoDB’s strengths. Our technology sits at the center of how modern applications are built and how organizations will harness data to power intelligent, adaptive systems. I am confident MongoDB is perfectly positioned to capture this next wave of innovation. As for me, I am not running away from MongoDB or leaving to join another company as CEO. I will remain on the Board and work closely with CJ to ensure a seamless transition. Over the years, this role has demanded an enormous amount of focus and energy; as a result, there are many things I’ve missed doing along the way. I’m looking forward to being more present for those moments — from simple time with my family to experiences and travel we’ve long put off. I plan to hold on to my MongoDB stock, as I firmly believe in the people and the opportunity, knowing that MongoDB’s best days are ahead of it. Yes, change can be unsettling. I’m sure you will have many questions about this change, such as why now, why CJ is the best person to lead the company, and what this means for you. We will hold an all-hands meeting tomorrow at 10:30AM ET to discuss this transition, introduce CJ and take your questions. That being said, I want to emphasize that the right change at the right time is how great companies get stronger. Just as a championship team refreshes its roster to stay competitive, MongoDB is bringing in new leadership, including other recent C-suite leaders who came before CJ, to drive our next phase of growth. This is not an ending; it’s the founding of a new moment. I am incredibly proud of what we have built together and genuinely excited about what lies ahead with CJ leading us forward. I also want to thank each of you for making this journey so meaningful. Words cannot fully capture my gratitude for your passion, creativity, and belief in building something truly special. I have often said that I want MongoDB to be an inflection point in people’s careers, a place where they can grow, take risks, and do the best work of their lives. I can say without hesitation that it has been exactly that for me. The skills I have developed, the experiences I have gained, and the relationships I have formed here have shaped me more than any other chapter in my professional life. I will carry them with me always, and will continue to cheer for and support MongoDB every step of the way. --Dev

November 3, 2025