A wide variety of companies around the world, from innovators in the social media space to industry leaders in energy, are running MongoDB on Google Cloud Platform (GCP). Increasingly, these organizations are consuming MongoDB as a fully managed service with MongoDB Atlas, which boosts the productivity of teams that touch the database by reducing the operational overhead of setup, ongoing management, and performance optimization.
When MongoDB Atlas became available on GCP last June, users were able to run it in 4 regions: us-east1 (South Carolina), us-central1 (Iowa), asia-east1 (Taiwan), europe-west1 (Belgium). This week we’re excited to launch the service across all Google Cloud Platform regions, allowing you to easily deploy and run MongoDB near you.
Most GCP regions are made up of 3 isolated locations called zones where resources can be provisioned. MongoDB Atlas automatically distributes a 3-node replica set across the zones in a region, ensuring that the automated election and failover process can complete successfully if the zone containing the primary node becomes unavailable.
For Atlas deployments in GCP’s Singapore region, which contains 2 zones instead of 3, it’s recommended that users enable Atlas’s cross-region replication to obtain a similar level of redundancy.
Atlas is available across all GCP regions now. We’re excited to see what you build with MongoDB and Google services!
Not an Atlas user yet? Get started <a href="https://www.mongodb.com/cloud/atlas?jmp=blog" target=_"BLANK">here.
Considering the Community Effects of Introducing an Official MongoDB Go Driver
What do you do when an open-source project you rely on no longer meets your needs? When your choice affects not just you, but a larger community, what principles guide your decision? Submitting patches is often the first option, but you're at the mercy of the maintainer to accept them. If the changes you need are sweeping, substantial alterations, the odds of acceptance are low. Eventually, only a few realistic options remain: find an alternative, fork the project, or write your own replacement. Everyone who depends on open source faces this conundrum at one time or another. After relying for years on the community-developed mgo Go driver for MongoDB, MongoDB has begun work on a brand-new, internally-developed, open-source Go driver. We know that releasing a company-sponsored alternative to a successful, community-developed project creates tension and uncertainty for users, so we did not make this decision lightly. We carefully considered how our choice would affect current and future Go users of MongoDB. First, some history: Gustavo Niemeyer first announced the mgo community driver in March, 2011 – around the same time that MongoDB released version 1.8.0 of the database. It currently has over 1,800 stars on GitHub and 32 contributors – including several current and former MongoDB employees. The incredible success of MongoDB in the Go community owes a great deal to Gustavo and mgo. MongoDB itself is part of this community. As the Go language matured and gained in popularity, MongoDB found many uses for it internally. Some of the projects using it include: Our remote agents for automated deployment, for backup, and for monitoring. Our command-line operations tools, like mongodump. (Re-written in Go for the 3.0 server release). Our home-grown continuous integration system, Evergreen . Our cloud products, like MongoDB Atlas and Stitch have major components written in Go. From this experience, our engineers contributed back to mgo: over half a dozen employees have commits in mgo, accounting for over 2000 lines of changes. But the more we used mgo, the more we discovered limitations. With our in-house drivers – covering popular languages with deep commercial adoption – we often start driver feature development in parallel with server feature development so that we can test them as soon as the server merges a feature. But as a community project, mgo's feature support generally lags MongoDB server development. More critically, our products that use mgo can't easily test against or take advantage of new server features. Even if we thought that Go didn't yet have critical mass in our user base to justify an in-house driver, our own company's products can't wait for new features. Sometimes, we patched a private copy of mgo to implement new features we critically needed. This isn't always easy. In 2015, we announced our next generation drivers , built upon a published set of specifications for driver behavior. Because mgo predates this work, its conventions and internals don't match our specifications. When the server implements new features and the driver development team writes specs to match, these new specs assume implementation of prior specs. Developing comparable features in mgo can mean starting from a completely different base. Not only does mgo have different internal conventions and behaviors than our in-house drivers, it encapsulates these behaviors in ways we found constraining. Usually, encapsulation is a good thing – a sign of good design – but many of our products benefit from low-level access to sockets, wire protocol models and encoding. End-users don't need this access, but we have the knowledge to work with our own communication protocols and message formats safely and to great effect. We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use. For example, our mongoreplay tool lets users replay a tcpdump of MongoDB server requests against a different server or cluster. When replaying the workload, we need server connection and authentication features – part of mgo's public API – but to replicate per-connection traffic we also need direct control over the number of socket connections and the socket message traffic, all of which is private. To enqueue requests and to read responses we need access to the types representing the wire protocol messages – also private types that are never visible to end users. Over time, we found ourselves copying-and-pasting parts of mgo source into project-specific libraries, or re-implementing parts of the wire protocol or driver behaviors directly. There is a real cost in the time it takes engineers to patch mgo or to write, fix and extend a plethora of internal libraries, plus opportunity costs of having our own products not being able to use our own server's latest features. We decided to consolidate and standardize on one implementation to address all these needs. We considered two alternatives: Fork mgo completely – developing at our pace, modifying internals as needed, and extending the APIs to suit our needs. Develop a new driver – building from the ground up to our specifications, putting it on par with our other officially-maintained drivers. Forking mgo would have a handful of benefits but many challenges. In the benefits column, forking would minimize the impact on our existing products that use mgo as well as for any user who chose to use our fork over the original. In the challenges column, we identified both technical and social considerations that gave us pause. On the technical side, a fork wouldn't solve the large gap to our common specifications, making new feature development much harder than for our internally-developed drivers. It also raises a tough question: what if we implement a new feature in our fork only to find that mgo implements it a different way? The more we might take the internal architecture and the API in a different direction from mgo, the harder it would be keep our fork a "drop-in" replacement and the harder it would be to send patches upstream or to merge in upstream development. We felt a fork would quickly become an independent, backwards-incompatible product, despite a common lineage – undercutting the alleged benefit of forking. On the social side, we knew that anything we released – whether a fork or a new driver – could have a disruptive effect on the existing mgo community. We didn't want to discourage anyone happy using mgo with MongoDB from continuing to use it. We wanted to invite people who wanted something more to try something new, rather than – via forking – implicitly asking people to pick sides in a project they already use. Forking could also imply that we would take on mgo's technical debt, which we wanted to avoid. In light of these challenges, we decided instead to write a new, independently-developed Go driver to join the eleven other drivers in our officially-maintained driver ecosystem . A fresh start allows us to focus our efforts on four main benefits: Velocity: once complete, the new Go driver will evolve as fast as the server does. We'll be able to dog-food new features internally before each server GA release. Consistency: the new Go driver will follow our common specifications from the outset, so the new driver API will feel like other MongoDB drivers, shortening the learning curve for users. We'll also be staying idiomatic to Go, such as supporting context objects for cancellable requests. Performance: a new driver gives an opportunity to provide a new, higher-performance BSON library and design the driver API in a way that gives users more control over memory allocations. Low-level API: for our own in-house products and other power users, we will provide low-level components for reuse, reducing code duplication across the company. Unlike the rest of the driver, this API will have no stability guarantee and no end-user support, but it will let us develop better products faster and our users will benefit that way. Fortunately, we were able to start from a prototype driver custom developed for our BI Connector – written by a former driver engineer – and build from that base towards the common driver specification. We're now finalizing the details of the new BSON library and the core CRUD API. What's next for the driver? In the coming months, we'll ship an "alpha" release of the Go driver and make the code repository public. At that point we’ll ask members of the Go-using MongoDB community to try it out and help us improve it with their feedback. Update, 2/19/2018: The new driver is now in alpha, please read the announcement for more info about trying it out .
Using Change Point Detection to Find Performance Regressions
At MongoDB, we want to (honestly) tell our users that each new version of our software is faster than the previous version. We also want to be able to explain why. We definitely do not want to learn that a release is slower (we have a performance regression) from our customers telling us after discovering it for themselves. In order to do this, we need to understand the performance of our software, detect performance changes early, and aggressively redress the root cause. We have invested significantly into building a performance testing system to achieve these goals. This includes creating a large number of performance tests, automating the running of those tests, and building tools to diagnose performance regressions when we find them. Those tools and tests are not enough by themselves: they produce an overwhelming amount of data. That data needs to be analyzed to determine if the performance changed. We could not process it all. We have developed new tools to process the data, using advanced statistical techniques to detect real performance regressions and identify the causes of those regressions. Where we started We built our original performance testing system in 2015. It ran a collection of performance tests directly in our CI system ( Evergreen ). We automated every step of running a test and collecting the results. That left the hard part: making sense of the results. Computers are fascinating things, built up from a huge number of simple and deterministic components. However, the interactions between those simple components lead to the emergence of non-deterministic behavior. As computers get more complex, the emergent behavior becomes more pronounced. The net effect is that when you run a program twice, the two executions will differ (i.e. one may take longer), even when run on the same machine. The problem gets even harder when you go from running on a single computer, to multiple computers in a distributed system. Network latencies will vary depending on the state of the network switches and other traffic on the network. The combination of each computer's variability combined with the variability of the network leads to more variability. MongoDB is a distributed system. When we test the performance of MongoDB, we have to address all of these issues. For performance tests, these differences show up as different measurements of performance. Your program may take more or less time to run. It may execute more or fewer operations within a period of time. You may see more or fewer slow operations. We call this phenomenon run to run variation or measurement noise. Run to run variation makes it harder to determine if changes to the software made the software intrinsically faster or slower. Thus, we did an enormous amount of work to limit the measurement noise in our tests, both in the original project, and in subsequent projects . Still, no matter how hard anyone tries, there will always be run to run variation. This presents a challenge when we want to interpret our performance results (or if you want to interpret your performance results). Maybe we are comparing two versions of our software and want to know which one is faster. If we have results that are 5% faster on the new version, is that due to our software being 5% faster? Or is the 5% due to run to run variation? Or worse, is the 5% change due to 10% run to run variation combined with our software actually being 5% slower? When we started, we only had a few performance tests. We manually inspected the results and could understand if and when the performance changed. However, as we added more tests, and more results per test, human inspection became less effective: we missed things and it was hard and unsatisfying work. We automated comparing the performance of one version of the software to another very early in the development of our system. We wrote software to compare the new performance results to older performance results. If the results changed more than 10%, we flagged it and had a human look at it. Using a direct comparison was common practice in the industry. It was also awful. The comparisons missed small regressions, they flagged a lot of false positives on noisier tests, and sometimes they flagged real things, but at the wrong time. The automated comparisons were much better than manual inspection, but still awful. We continually built improvements to make the system less awful. We had a system to increase the comparison threshold (from 10%) for noisier tests, and a system to reset the comparison when there was a change in performance (i.e., compare to the new normal). These changes improved the system, but they did not fundamentally overcome the challenges we faced. Solving the right problem Along the way, we realized we were trying to solve the wrong problem. Our automated comparison was answering the question: “Has measured performance changed more than 10% between these two versions of software”. What we really wanted to answer was “Which software changes altered performance (for better or worse)”. Those two questions overlap for large performance changes in low noise environments, but they differ on noisy tests or for small changes in performance. The second question (“which software changes altered performance?”) focuses on detecting changes in a measured value over time. This question maps to a known problem called change point detection . Change point detection is the problem of finding when changes in values occurred in time ( time-series ) in the presence of noise or other confounding variables. For example, it’s used to detect changes in behavior on such things as electricity consumption, population totals, local weather, and stock prices. There’s a lot of existing work on change point detection, so we just needed to pick the best existing work, implement it, and put it into production. Simple, right? Well, maybe not. We did not know what was the best existing work, and we did not know if it would fix our problems. So, we did some research, identifying likely techniques and collecting papers on them. The papers accumulated and stayed on my desk, because I didn’t have time to dive into a speculative project when there were plenty of things that needed to be done NOW . Enter an intern During the summer of 2017, two interns joined us on the performance team. They spent the summer working with us on our performance testing infrastructure. Both of them were great, giving our work an extra push forward. We encourage our interns to learn and grow. One way we do this is by explaining what we are doing and why we are doing it. We explain the larger context of the work. This naturally leads to discussing open challenges. One of our interns asked if they could read that stack of papers sitting on my desk (of course they could). Towards the end of the summer, he had completed his summer project early. Further, he had read the papers, understood them, and asked if he could make a prototype! In particular, he had gone through the complex math of the papers, and figured out how that math could be implemented in software. He built a prototype. It was limited, but it proved that the concept could work. The algorithm clearly found the changes in the sample traces we created, and did not get confused when run on sample data containing random background noise. Based on this initial success, we scheduled a larger proof of concept project to integrate the algorithm with our production system. We compared this second proof of concept with the existing comparison code, and determined it was MUCH better. We then did the work to get the algorithm in production and update our processes to use it. Our production system today When we started in 2015, we ran only a handful of tests and only a handful of people used the performance infrastructure directly. Today we run hundreds of distinct performance tests, generating over 100k distinct results per software commit. Today, everyone who develops MongoDB interacts with our performance testing infrastructure. When a developer commits a change to MongoDB, tests are run. Upon completion, change point detection is used to detect performance changes (improvements and regressions). A dedicated team triages these changes, isolates them to specific commits, and assigns these changes to developers to investigate. In the case of improvements, the developers confirm that the change was expected, or investigate the change to understand why the performance got better. Sometimes things get faster because of bugs – we have found bugs this way. Trend graph for a performance test in MongoDB. The green diamond marks the detected change point that has been triaged and confirmed. This was a recent 15% improvement in bulk insert performance for sharded clusters. Our system is good at detecting regressions and our engineers are good at fixing them. Even better than fixing a regression, is preventing a performance regression from ever being committed to our development branch. Developers can test their proposed changes before committing the changes, using something called a patch build. In this way, the developers can make sure they are not introducing new performance regressions, verify a fix, or confirm an optimization before committing their code. Advancing science! At MongoDB we take pride in developing a database and a database platform that empowers developers to make applications that change the world. We depend on our performance testing infrastructure to ensure we ship a performant database. We are proud of the performance infrastructure we have built and the impact it has had on the software we ship to our users. We do not do any of this work in a void. At MongoDB we benefit from being part of several communities, and we want to support these communities. It is for this reason that most of our database source code is publicly available and our JIRA project for database development is also public. When we developed a new way of finding performance regressions in our software, we didn’t hide it away. Instead, we shared it with the community, and will continue to do so as we learn and progress. This started with submitting a paper called “ The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System ” to the International Conference on Performance Engineering (ICPE) . It has continued with more papers ( Creating a Virtuous Cycle in Performance Testing at MongoDB , Automated system performance testing at MongoDB ) and presentations. These talks and presentations have helped the community, but they have also helped us. By sharing and participating in the community, we have more people thinking about our problems. We’ve had the best minds in performance engineering in academia sharing ideas and suggestions with us on how to improve our technology! Often the ideas build on each other. One such idea led to the creation of the Data Challenge Track at ICPE in 2022. Building on our papers, we were able to open up our performance test results as a shareable artifact . The data challenge itself was simple: do something interesting with our performance test data. Researchers were thrilled to have industry data to evaluate and demonstrate their ideas. We were thrilled to have researchers working on our problems. In the end, it led to four strong papers which have impacted how we test performance at MongoDB. We continue to work on sharing our data and learnings. We have an ongoing collaboration within the SPEC Research Group to create better datasets and algorithms for detecting performance regressions. The group is combining our data with other industry datasets and curating the data. The results will enable researchers to understand the performance and accuracy of current algorithms, test new algorithms, and clearly show any improvements. All using real industry data from us and other companies. In each of these interactions, the community wins and we win. By sharing our data we enable better research, we get to take advantage of that research, and the research is better aligned with our needs. Investing in the future Of the two interns mentioned in this post, one is now a full time employee of MongoDB, and the other is pursuing a Ph.D. in computer science at Columbia. One is directly improving our software, and the other one is improving the theory and tools we use to build our software. We are very proud of both of them. The MongoDB database is faster today because of their work on our performance testing infrastructure. Thanks to that infrastructure we better understand why the database performs the way it does, why that performance changes, and when that performance changes. We continue to invest in and improve this critical piece of our infrastructure. We have teams dedicated to extending and improving it. We lean into our academic interactions to improve the state of the art for everyone. And we invest in the people who work on these systems (interns included). We hope you consider using these techniques yourself and letting us and the community know how it goes for you. If you are an academic, please improve the theoretical underpinnings of this entire space – we are happy to talk to you about it. And if the problems and software described in this post sounded interesting to you, we are hiring! Come join us and help us solve these problems. If you would like to learn more about our performance testing environment, check out some of our papers and presentations: Papers ICPE2020 The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System DBTest.io 2020: Automated System Performance Testing at MongoDB ICPE 2021: Creating a Virtuous Cycle in Performance Testing at MongoDB Presentations ICPE2020 The Use of Change Point Detection to Identify Software Performance Regressions in a Continuous Integration System [ Video ] -- [ Slides ] ICPE 2021: Creating a Virtuous Cycle in Performance Testing at MongoDB [ Video ] -- [ Slides ] CMU Database Seminar Series: How to Waste Time and Money Testing the Performance of a Software Product [ video ] -- [ slides ] Performance Advisory Council 2021: Creating a Virtuous Cycle in Performance Testing [ Slides ]