David Golden

3 results

Considering the Community Effects of Introducing an Official MongoDB Go Driver

What do you do when an open-source project you rely on no longer meets your needs? When your choice affects not just you, but a larger community, what principles guide your decision? Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them. If the changes you need are sweeping, substantial alterations, the odds of acceptance are low. Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement. Everyone who depends on open source faces this conundrum at one time or another. After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver. We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB. First, some history: Gustavo Niemeyer first announced the mgo community driver in March, 2011 – around the same time that MongoDB released version 1.8.0 of the database. It currently has over 1,800 stars on GitHub and 32 contributors – including several current and former MongoDB employees. The incredible success of MongoDB in the Go community owes a great deal to Gustavo and mgo. MongoDB itself is part of this community. As the Go language matured and gained in popularity, MongoDB found many uses for it internally. Some of the projects using it include: Our remote agents for automated deployment, for backup, and for monitoring. Our command-line operations tools, like mongodump. (Re-written in Go for the 3.0 server release). Our home-grown continuous integration system, Evergreen . Our cloud products, like MongoDB Atlas and Stitch have major components written in Go. From this experience, our engineers contributed back to mgo: over half a dozen employees have commits in mgo, accounting for over 2000 lines of changes. But the more we used mgo, the more we discovered limitations. With our in-house drivers – covering popular languages with deep commercial adoption – we often start driver feature development in parallel with server feature development so that we can test them as soon as the server merges a feature. But as a community project, mgo's feature support generally lags MongoDB server development. More critically, our products that use mgo can't easily test against or take advantage of new server features. Even if we thought that Go didn't yet have critical mass in our user base to justify an in-house driver, our own company's products can't wait for new features. Sometimes, we patched a private copy of mgo to implement new features we critically needed. This isn't always easy. In 2015, we announced our next generation drivers , built upon a published set of specifications for driver behavior. Because mgo predates this work, its conventions and internals don't match our specifications. When the server implements new features and the driver development team writes specs to match, these new specs assume implementation of prior specs. Developing comparable features in mgo can mean starting from a completely different base. Not only does mgo have different internal conventions and behaviors than our in-house drivers, it encapsulates these behaviors in ways we found constraining. Usually, encapsulation is a good thing – a sign of good design – but many of our products benefit from low-level access to sockets, wire protocol models and encoding. End-users don't need this access, but we have the knowledge to work with our own communication protocols and message formats safely and to great effect. We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use. For example, our mongoreplay tool lets users replay a tcpdump of MongoDB server requests against a different server or cluster. When replaying the workload, we need server connection and authentication features – part of mgo's public API – but to replicate per-connection traffic we also need direct control over the number of socket connections and the socket message traffic, all of which is private. To enqueue requests and to read responses we need access to the types representing the wire protocol messages – also private types that are never visible to end users. Over time, we found ourselves copying-and-pasting parts of mgo source into project-specific libraries, or re-implementing parts of the wire protocol or driver behaviors directly. There is a real cost in the time it takes engineers to patch mgo or to write, fix and extend a plethora of internal libraries, plus opportunity costs of having our own products not being able to use our own server's latest features. We decided to consolidate and standardize on one implementation to address all these needs. We considered two alternatives: Fork mgo completely – developing at our pace, modifying internals as needed, and extending the APIs to suit our needs. Develop a new driver – building from the ground up to our specifications, putting it on par with our other officially-maintained drivers. Forking mgo would have a handful of benefits but many challenges. In the benefits column, forking would minimize the impact on our existing products that use mgo as well as for any user who chose to use our fork over the original. In the challenges column, we identified both technical and social considerations that gave us pause. On the technical side, a fork wouldn't solve the large gap to our common specifications, making new feature development much harder than for our internally-developed drivers. It also raises a tough question: what if we implement a new feature in our fork only to find that mgo implements it a different way? The more we might take the internal architecture and the API in a different direction from mgo, the harder it would be keep our fork a "drop-in" replacement and the harder it would be to send patches upstream or to merge in upstream development. We felt a fork would quickly become an independent, backwards-incompatible product, despite a common lineage – undercutting the alleged benefit of forking. On the social side, we knew that anything we released – whether a fork or a new driver – could have a disruptive effect on the existing mgo community. We didn't want to discourage anyone happy using mgo with MongoDB from continuing to use it. We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use. Forking could also imply that we would take on mgo's technical debt, which we wanted to avoid. In light of these challenges, we decided instead to write a new, independently-developed Go driver to join the eleven other drivers in our officially-maintained driver ecosystem . A fresh start allows us to focus our efforts on four main benefits: Velocity: once complete, the new Go driver will evolve as fast as the server does. We'll be able to dog-food new features internally before each server GA release. Consistency: the new Go driver will follow our common specifications from the outset, so the new driver API will feel like other MongoDB drivers, shortening the learning curve for users. We'll also be staying idiomatic to Go, such as supporting context objects for cancellable requests. Performance: a new driver gives an opportunity to provide a new, higher-performance BSON library and design the driver API in a way that gives users more control over memory allocations. Low-level API: for our own in-house products and other power users, we will provide low-level components for reuse, reducing code duplication across the company. Unlike the rest of the driver, this API will have no stability guarantee and no end-user support, but it will let us develop better products faster and our users will benefit that way. Fortunately, we were able to start from a prototype driver custom developed for our BI Connector – written by a former driver engineer – and build from that base towards the common driver specification. We're now finalizing the details of the new BSON library and the core CRUD API. What's next for the driver? In the coming months, we'll ship an "alpha" release of the Go driver and make the code repository public. At that point we’ll ask members of the Go-using MongoDB community to try it out and help us improve it with their feedback. Update, 2/19/2018: The new driver is now in alpha, please read the announcement for more info about trying it out .

January 11, 2018

Server Selection in Next Generation MongoDB Drivers

I love to cook. Sometimes, my guests like something so much that they ask for the recipe. Occasionally, I have to confess there isn't one — I just made it up as I went along! Improvisation is fine in the kitchen, but it's not a great approach for consistency in software development. The MongoDB Drivers team is responsible for writing and maintaining eleven drivers across ten languages. We want our drivers to have similar behaviors, even while staying idiomatic for each language. One way we do that is by writing and sharing driver specification documents for those behaviors that we'd like to have in common across all drivers. Just as a recipe helps a chef serve a consistently great dish night after night, these specifications guide software development for consistency across all drivers, at MongoDB and our community. One of the most recent specifications we've developed covers server selection . Production MongoDB deployments typically consist of multiple servers, either as a replica set or as a sharded cluster. Server selection describes the process by which a driver chooses the right server for any given read or write operation, taking into account the last known status of all servers. The specification also covers when to recheck server status and when to give up if an appropriate server isn't available. The rest of this article describes our design goals and how server selection will work in the next generation of MongoDB drivers. Design Goals The most important goal is that server selection be predictable . If an application is developed against a standalone server, later deployed in production against a replica set, then finally used with a sharded cluster, the application code should be constant and only need appropriate changes to configuration. For example, if some part of an application queries a secondary, that should succeed with a standalone server (when the notion of primary and secondary is irrelevant), work as expected against a replica set, and keep working in a sharded cluster where secondary reads are proxied by a mongos. The second design goal is that server selection be resilient whenever possible. That means that in the face of detectable server failures, drivers should try to continue with alternative servers rather than immediately fail with an error. For a write, that means waiting for a primary to become available or switching to another mongos (for a sharded cluster). For a read, that means selecting an alternative server, if the read preference allows. The third design goal is that server selection be low-latency . That means that if more than one server is appropriate for an operation, servers with a lower average round-trip time (RTT) should be preferred over others. Overview of the Server Selection Specification The Server Selection specification 1 has four major parts: Configuration Average Round-Trip Time (RTT) Read Preferences Server Selection Algorithm Configuration Server selection is governed primarily by two configuration variables: serverSelectionTimeoutMS . The serverSelectionTimeoutMS variable gives the amount of time in milliseconds that drivers should allow for server selection before giving up and raising an error. Users can set this higher or lower depending on whether they prefer to be patient or to return an error to users quickly (e.g. a "fail whale" web page). The default is 30 seconds, which is enough time for a typical new-primary election to occur during failover. localThresholdMS . If more than one server is appropriate for an operation, the localThresholdMS variable defines the size of the acceptable "latency window" in milliseconds relative to the server with the best average RTT. One server in the latency window will be selected at random. When this is zero, only the server with the best average RTT will be selected. When this is very large, any appropriate server could be selected. The default is 15 milliseconds, which allows only a little bit of RTT variance. For example, in the illustration below, Servers A through E are all appropriate for an operation – perhaps all mongos servers able to handle a write operation – and the localThresholdMS has been set to 100. Server A has the lowest average RTT at 15ms, so it defines the lower bound of the latency window. The upper bound is at 115ms, thus only Servers A, B and C are in the latency window. Servers A, B and C are in the latency window The ‘localThresholdMS’ variable used to be called secondaryAcceptableLatencyMS, but was renamed for more consistency with mongos (which already had localThreshold as a configuration option) and because it no longer applies only to secondaries. Average Round-Trip Time Another driver specification, Server Discovery and Monitoring, defines how drivers should find servers from a seed list and monitor server status over time. During monitoring, drivers regularly record the RTT of ismaster commands. The Server Selection specification calls for these to be calculated using an exponentially-weighted moving average function. If the prior average is denoted RTT t-1 , then the new average (RTT t ) is computed from a new RTT measurement (X t ) and a weighting factor (α) using the following formula:        t = α·X t + (1-α)·RTT t-1 The weighting factor is set to 0.2, which was chosen to put about 85% of the weight of the average RTT on the 9 most recent observations. Weighting recent observations more means that the average responds quickly to sudden changes in latency. Read Preferences A read preference indicates which servers should handle reads under a replicated deployment. Read preferences are usually configured in the connection string or the top-level client object in a driver. Some drivers may allow read preferences to be set at the database, collection or even individual query level, as well. A read preference can be thought of as a document with a mode field and an optional tag_sets field The mode determines whether primaries or secondaries are preferred: primary: only read from the primary secondary: only read from a secondary primaryPreferred: read from the primary if possible, or fall back to reading from a secondary secondaryPreferred: read from a secondary if possible, or fall back to reading from the primary nearest: no preference between primary or secondary; read from any server in the latency window The tag_sets field, if provided, contains a tag set list that is used to filter secondaries from consideration (thus it only applies when the mode is not "primary"). The terminology around tags and tag sets can be a little confusing, so the Server Selection specification defines them like this: tag: a single key/value pair tag set: a document containing zero or more tags tag set list: an ordered list of tag sets In a replica set, one can assign a tag set to each server to indicate user-defined properties for each server. A read preference tag set matches a server tag set if the read preference tag set is a subset of the server tag set. In a replica set, one can assign a tag set to each server to indicate user-defined properties for each server. A read preference tag set matches a server tag set if the read preference tag set is a subset of the server tag set. For example, a read preference tag set { dc: 'ny', rack: 2 } would match a server with the tag set { dc: 'ny', rack: 2, size: 'large' }:       { dc: 'ny', rack: 2 } ⊆ { dc: 'ny', rack: 2, size: 'large' } Because the tag set list is ordered, the first tag set that matches any secondary is used to choose eligible secondaries. For example, consider the following tag set list (where 'dc' stands for 'data center'):       [ { dc: 'ny', rack: 2 }, { dc: 'ny' }, { } ] First, the driver tries to choose any secondaries in the NY data center on rack 2. If there aren't any, then any secondaries at all in the NY data center are chosen. If the NY data center itself is down, the last tag set allows any secondary to be chosen. If the behavior of the empty tag set ({ }) seems surprising, remember that in mathematical terms, the empty set is a subset of any set, thus the empty set matches all secondaries. It's a good fallback for an application that prefers particular secondaries, but doesn't want to fail if those secondaries aren't available. Server Selection Algorithm When a driver needs to select a server, it follows a series of steps to either select a server or else try again until the server selection timeout is reached. Within the algorithm, there are slight differences for different deployment types to ensure that the overall selection process achieves the predictable design goal. A high-level overview of the algorithm follows 3 : 1. Record the server selection start time When selection starts, the driver records the starting time to know when the selection timeout has been exceeded. 2. Find suitable servers by topology type A 'suitable' server is one that satisfied all the criteria to carry out an operation. For example, for write operations, the server must be able to accept a write. The specific rules for suitability vary by the type of deployment: Single server : The single server is suitable for both reads and writes. Any read preference is ignored. Replica set : Only the primary is suitable for writes. Servers are suitable for reads if they meet the criteria of the read preference in effect. Sharded cluster : Because mongos is a proxy for the shard servers, any mongos server is suitable for reads and writes. For reads, the read preference in effect will be passed to the selected mongos for it to use in carrying out the operation on the shards. 3. Choose a suitable server at random from within the latency window If there is only one suitable server, it is selected and the algorithm ends. If more than one server is suitable, they are further filtered to those within the latency window. If there is more than one suitable server in the window, one is chosen at random to fairly distribute the load and the algorithm ends. Because the server with the shortest average RTT defines the lower bound of the latency window, it is always one of the servers that might be selected. 4. If there are no suitable servers, wait for a server status update If no server is selected – for example, when the driver needs to find a replica set primary, but the replica set has failed over and is having an election to choose the new primary – then the driver tries to update the status of the servers it is monitoring and waits for a change in status. 5. If the server selection timeout has been exceeded, raise an error If more than serverSelectionTimeoutMS milliseconds have elapsed since the start of server selection, the driver raises a server selection error to the application. 6. Goto Step #2 If the timeout has not expired and the status of servers have been updated, then the selection algorithm continues looking for suitable servers. Summary The Server Selection specification will guide the next generation of MongoDB drivers in a consistent approach for server selection that deliver on three goals of being predictable , resilient , and low-latency . Users will be able to control how long server selection is allowed to take with the serverSelectionTimeoutMS configuration variable and control the size of the acceptable latency window with the localThresholdMS configuration variable. For more on the next-generation MongoDB drivers, see our blog post, Announcing the Next Generation Drivers for MongoDB . About the Author - David David is a senior engineer on the Developer Experience team. He has been active in open-source software for over 15 years, with particular emphasis on the Perl language and community. When he's not writing spec documents, David maintains the MongoDB Perl driver and avoids social media as much as possible. 1 http://goo.gl/HM3tgS 2 http://goo.gl/wOsmJb 3 For this article, some steps have been simplified and some client-server interoperability checks have been omitted

March 27, 2015