GIANT Stories at MongoDB

The MongoDB Summer ‘18 Intern Series: Communication is Key

Remi Lederman is a rising senior at the University of Pennsylvania. She’s a communications major with a minor in art history, the managing editor of UPenn’s culture magazine, and a 2018 Summer Intern for the MongoDB communications team.

As one of four marketing interns among a class of 62 comprised of mostly engineers, Remi has an interesting take on what it’s like to work on the less technical side of the business.

Andrea Dooley: The internship program at MongoDB is pretty popular among computer science undergrads. As a comms major, how did you first learn about MongoDB and the internship opportunity?
Remi Lederman: UPenn has a big engineering school and MongoDB is very well known; there’s a lot of t-shirts on campus. A lot of my friends are engineers, so I knew the company, but didn’t really know exactly what it did. I went to a career fair where I was able to meet the campus team, and found the opportunity to be really interesting.

AD: Why did you want to intern at MongoDB this summer?
RL: After the career fair I kept up with the company and knew it was growing fast and that it was a really exciting time. I had an idea of the workplace culture by talking to previous interns, and had interned at a tech company the summer before, so I was looking for a similar kind of environment. I knew I wanted to do something in communications, and I felt the opportunity at MongoDB would be the perfect fit. The application process was seamless and the campus team kept me really informed. There was a lot of communication, I never felt out of the loop, and the overall process was very professional and organized.

AD: As someone not very familiar with computer science, was it difficult to learn MongoDB technology?
RL: While someone who is not very tech savvy may have a general idea of the database functionality, it’s hard to have a deep understanding of the technology. As an intern, we went through training in our first week that provided insight into what MongoDB is, how it fits into the stack, how we compare others in the industry, and our overall value proposition. For the role I’m in, it’s important to understand the different product offerings and features, what it means to be on version 4.0 versus version 3.6 for example, and what’s important to our customers and community. It’s different than what you would need to know or how you would view the database from an engineering standpoint.

AD: What’s it like to work with mostly engineers?
RL: The company is so social and everyone's so nice, it’s not hard to hang out together, even though our jobs are so different. It’s also really nice to have a big intern class. There's 62 of us, so it’s fun to attend all the events the campus team puts on for the interns and get to know each other.

AD: If you’re not working with the technology, what sorts of project do you get to work on?
RL: I get new things to work on every day. For example, had a chance to take a first stab at the press release for MongoDB University reaching 1 million registrations, which was exciting. I’ve also helped create briefing books for customers and executives, which are prep guides for when they go into an interview or are giving a talk. It helps prepare them for what to expect in terms of talking points, background information, and questions that may be asked. I love that I get to help out on many visible things the team is working on. So when I see our CEO on CNBC, or our customers at MongoDB World using the stuff I helped to prepare, it’s really rewarding.

AD: I know the engineering interns get to identify their preferred teams and projects. Did you have input in determining what you would work on this summer?
RL: I’m valued for my writing skills, so I get to write a lot and do a lot of editing. My mentor really pushed me to come up with my own projects, and creativity is really fostered here. In the beginning of the summer I was encouraged to outline goals for the internship. At some point during the summer I had the opportunity to attend a Crisis PR webinar. Like most companies, I knew that we had some form of a crisis plan in place that plan outlines what we do, the chain of command, and who is the designated team, but I wanted to put more detail into it, like creating templates for responses. My hope is that I leave at the end of my internship having given something tangible and impactful back to the company.

AD: What’s been the biggest lesson you’ve learned so far?
RL: Preparing for all possible outcomes. In school we would analyze the scenario after it occurred, but here we want to get ahead of things. I’ve learned to do that by over communicating as opposed to under communicating.

AD: What would you say to other marketing/comms interns interested in MongoDB?
RL: A lot of companies have similar roles to what I am doing here, so when choosing where to intern for the summer I looked at other factors to differentiate each opportunity. For me, working at a company like MongoDB is ideal because it’s an exciting environment. I’m invested in MongoDB because what I’m doing has a real impact, which in turn that has made an impact on me. I truly feel like a part of the company. I’m very impressed with everyone and everything at MongoDB, and I feel lucky to be a part of it.

To learn more about the MongoDB internship program, click here.

Employees Recognized for Work Outside of MongoDB

An organization's success can often be attributed to its people, because it’s people who dedicate their time to helping a company achieve goals and be recognized as an industry leader. We find it’s even more of an achievement when our people are recognized for the work they do outside of MongoDB.

Two passionate members of the MongoDB Engineering organization were acknowledged by separate notable organizations for their independent work.

Dr. Michael Cahill is the Vice President of Engineering (Storage). Based out of our Sydney office, Michael leads the global Storage team which is responsible for concurrency control and crash recovery. Optimizations in the storage layer can have a huge impact on making customer workloads more efficient.

Michael was recently recognized as a winner of the Test of Time Award at the annual SIGMOD conference for his work on a new algorithm for implementing serializable isolation. SIGMOD is the Association for Computing Machinery's Special Interest Group on Management of Data, which specializes in large-scale data management problems and databases. The conference is considered one of the most important database conferences in the world. Researchers and engineers working on database technology come together to present their work, and new innovations are often published first at SIGMOD.

“Serializable Isolation is the gold standard for databases: it means that applications using the database can reason as if transactions run one at a time. There is never any interference between concurrent transactions and each transaction takes the database from one consistent state to another. My contribution was to use database internals, including hooks in code for multi-version reads and an extension of intent locks, to detect potential anomalies at runtime and make executions safe regardless of the application logic.

“I’m both proud and humbled to receive the Test of Time Award this year. While I’m proud of the work we did and the impact it has had, I am humbled to see my name listed beside some of the greats of the field who blazed the trail before me.”

The main implication of Michael’s work is that there is now a way to build databases that provide serializable isolation with only a modest change to existing systems.

A. Jesse Jiryu Davis is a staff engineer at MongoDB on the Drivers Team. He leads development on the C and C++ drivers and is the author and maintainer of Motor, an async Python driver. He also pitches in on PyMongo development and oversees the design and specifications process for a lot of new MongoDB driver features. He spends time mentoring new coders and speaking at conferences.

Jesse is a member of the Python Software Foundation, which manages the Python language and the community of Python programmers, and sponsors dozens of Python conferences.

“When I joined about 7 years ago their main mission was to promote the use of Python, and they won: Python is perhaps the most popular language. Now I see the PSF devoting much more effort to expanding computer science access to poor countries and marginalized groups, using Python as a vehicle. For example, the most recent PSF grants were to girls' education events in Brazil and Cameroon.”

Jesse was recognized by the Python Software Foundation for his community service work, including his work on their blog, and his involvement with the NYC conference PyGotham.

“I was awarded for two responsibilities I've really enjoyed. My work on the PSF blog gives me an excuse to interview some of the smartest and most accomplished people I know. The second part of the award, for my work on the PyGotham conference, was mainly in recognition of my speaker-coaching program: I thought it would be helpful for first-time speakers to have professional coaching, so I raised enough money to hire my own speaking coach to spend an hour with each of them. This year we're repeating that program for PyGotham speakers and expanding it to the PyOhio conference, too. Speaking at conferences transformed my career, and I want to make sure that everyone has the same opportunity I did to learn public speaking, particularly members of groups underrepresented in tech.”

We could not be more proud of both Michael and Jesse for their recognitions. It is a true testament to the level of talent at MongoDB, and the passion of the people behind the product.

To learn more about MongoDB people and culture, click here.

Just Released: MongoDB ODBC Driver

Seth Payne
July 12, 2018

Earlier this month, we released the new ODBC driver for the MongoDB Connector for Business Intelligence (BI Connector). In this post, we’ll walk through installation and setup of an ODBC connection on Windows 10 running the 32bit version of Excel.

Using Power BI to Gain Insight Into your MongoDB Data

Seth Payne
July 11, 2018

We are happy to announce updates to the MongoDB Connector for Business Intelligence for use with Microsoft’s Power BI Desktop. Now, it is simpler than ever for Power BI users to access data stored in MongoDB and use Power BI’s powerful analytical and visualization tools to gain insights into the data, and then effectively share these insights with colleagues.

Within just a few minutes, you can expose MongoDB data directly to Power BI to begin creating meaningful charts, dashboards, and reports.

MongoDB as a Data Platform for Business Intelligence

As both MongoDB’s popularity and adoption continue to rapidly grow, organizations are choosing MongoDB as the data platform that supports a variety of applications where tabular, or relational, database systems had been used historically. MongoDB’s document data model presents the best way to work with data for business critical applications, while distributed systems design allows users to intelligently put data where they want it, and enable the freedom to run anywhere - on premise or in the cloud.

But data platform environments are rarely, if ever, homogenous. And, given both the longevity and large install base of enterprise RDBMS, it is common for MongoDB to be part of a larger ecosystem that includes a variety of data sources, many of which are tabular in nature. Because of the need to manage data from multiple systems, administrators are looking for ways to expose all their data to their non-technical business users in a consistent and friendly way, no matter where that data is physically stored.

The MongoDB BI Connector enables Power BI users to easily query, analyze and visualize data stored in MongoDB in the same manner they connect to other Power BI data sources. No knowledge of MongoDB or the MongoDB Query Language (MQL) required!

Exposing MongoDB Data To Power BI Desktop

One of the benefits of using MongoDB as a platform for BI, is that it eliminates the need for complex ETL operations. BI data workloads can be isolated; separating operational workloads from analytics on separate replica nodes, all operating as a part of the same cluster. The data can also be exposed directly to end users without the need to modify and transfer data before it becomes available to analysts.

Power BI can import MongoDB data through a direct connection to the MongoDB BI Connector or, via ODBC. Once a data connection has been defined, simply select the data you want to work with and import it.

After the import is complete, you can begin working with the data in Power BI Desktop as you would with any data source. If you need to refresh your data, you can easily do so at any time.

Data that resides right in your app is a gold mine when it comes to understanding your customers, but who has time to wait for nightly ETL workflows? With the MongoDB Connector for BI you can take control of your data and get to insights faster. See just how quickly you can discover something!

Click here to get started free. Use POWERBI100 code for free $100 MongoDB Atlas Credits.

High-end retailer in Germany delivers omni-channel shopping experience on MongoDB Atlas for thousands of daily online users

Top German retailer migrates to MongoDB Atlas to automate database operations, create a microservices architecture, and move faster as an organization.

The importance of delivering an optimized customer experience cannot be overstated, especially if your business is high-end retail. For Breuninger, the customer-first approach has been in their DNA for more than 130 years.

When the top German retailer set out to build a new e-commerce platform, they wanted the online experience to match that of walking in to one of Breuninger’s premium department stores. Accomplishing this goal required a feature-rich, high-performance, and reliable database capable of supporting complex data sets across multiple categories.

“Today, our development teams have a lot of independence. We only have a handful of rules about how they design and build applications within their respective business units,” says Benedikt Stemmildt, Lead Software Architect of E. Breuninger GmbH & Co. “It’s not quite a rule that you have to use MongoDB, but you do have to explain yourself if you don’t.”

However, it wasn’t always this way. Breuninger’s previous platform was built on one of the industry-standard product content management (PCM) platforms, which Stemmildt felt was “monolithic and difficult to code for.” Code freezes were common and the underlying architecture was a frequent cause of frustration for an organization striving to adopt more agile processes.

A new development and feature roll-out approach was needed to execute the company’s aggressive omni-channel integration plans, and time to market for new online features became a top priority. Breuninger decided to build a technology group in response, going from 10 to 30 in-house developers in just a year.

“We broke down our monolithic architecture and split our application into separate microservices that reflect how our customers shop in the physical stores,” Stemmildt says. “It’s the customer journey — they search, discover, evaluate, and buy not just individual products, but complete outfits.”

“To reflect this architectural change, we split our development teams by different steps of the customer journey and kept dependencies to an absolute minimum,” Stemmildt continues. “One key to making this work is a high-performance database capable of working easily with data in lots of different ways. The document model of MongoDB means we can deliver data with the quality and detail that reflects our products and shopping experience.”

The result? Much faster time to market. Breuninger was able to build their omni-channel platform in months rather than years by enabling teams to decide on important architectural components for their own sections, without having to ask the permission of other teams.

As a seven-year veteran of MongoDB, Stemmildt was confident in recommending the database to his organization. “There are a lot of good databases,” he says. “However, many of them require developers to have a deep knowledge about how they work before getting any benefit. MongoDB is not like that. It’s very quick to learn and start getting results. Our teams are able to deliver features straight away. Once users do expand their use of the database, it’s so feature-rich that you never get a sense of having to push it beyond what it was designed for.”

And agile wouldn’t be agile without automation. “Everything we deploy is automated, and with MongoDB Atlas on AWS, the deployment and management of our databases fit neatly into our processes. After a period of operating MongoDB ourselves on EC2, it’s great not having to worry about the details and not having to spend time setting up, configuring, and managing database[s]. You free up a lot of opportunities to add value to your service by not running things yourself.”

AWS offers a healthy mix of other tools for the teams at Breuninger to leverage, such as a managed Kubernetes service and serverless Lambda functions. MongoDB Atlas and AWS also help Breuninger stay on the right side of the regulators. “We need to comply with GDPR so we keep everything running within our borders. MongoDB Atlas’s built-in security features have helped us satisfy these requirements.”

The finished platform might look different to someone who is used to traditional architectures, but to Stemmildt, not being restrained by legacy approaches makes a lot of sense. “Each of our teams owns one or more sections of the customer journey. The search team updates its own database, pulling data in from the product data producer via a feed and re-populating its own database as needed. We don’t have to ripple refreshes out across the system as they happen. That means each team is free to add new features without changing some core database component and affecting other teams. Self-contained systems are an important design rule.”

And although there are some 25 different and largely independent systems, the customers see just one website. A front-end proxy uses server-side includes to marshal data as required from a mix of micro-frontends before delivering the final composite to the shopper. Product data, product availability, outfit data, price information, navigation metadata — these are all woven together from separate MongoDB databases as the customer goes through the shopping experience online.

Comparing a microservices architecture to a monolithic one revealed to Breuninger that some metrics don’t matter as much as they once did, while others matter more. “With multiple teams developing things so rapidly, I don’t know exactly how much total data is in play. But we are a very metrics-driven company, not just in the technical infrastructure but across the business. We know when a component is and is not working well from both a technical and business perspective, if it needs optimizing for performance, or whether it is delivering value to the business or we need to revisit that aspect of the system architecture.”

While Stemmildt couldn’t comment too much on future plans, he’s enthusiastic about MongoDB’s part in whatever they may be. “We wanted high performance, but most importantly we wanted to be able to add more features. We’re not using MongoDB’s graph database feature yet, but we may be by the end of the year. There are a lot of things we could do with text search, too.”

Other new features — such as multi-document transaction support in MongoDB 4.0 — may also be useful, but in unorthodox ways. “I don’t actually think transactions are needed anymore for our platform,” he laughs, “But there are some teams, like the customer data team, who don’t agree with me yet and won’t use MongoDB because of that. So the release of MongoDB 4.0 will help me to help them make the transition.”

While customers won’t see the nuts and bolts of Breuninger’s transformation to a data-driven enterprise, they will benefit from the company’s newly integrated omni-channel platform, which delivers an improved customer experience and more ways to get inspired.

And to anyone thinking about using MongoDB on their next project, Stemmildt has just one piece of advice: “Use it. Get a MongoDB Atlas account, create a cluster, and play with it. The way we see it, after the majority of our teams have naturally adopted MongoDB, if you can’t say why you should use another database, then you should just use MongoDB.”

The MongoDB Summer ‘18 Intern Series: Back to Back

Haley Connelly is a 5th year Computer Science Major at UT Austin, and a two time MongoDB Summer Intern.

Haley first joined MongoDB during the summer of 2017 as a rising senior. She has made her return to MongoDB for Summer 2018, as she works on her thesis in computational neuroscience. In her short time here, Haley has been able to make a giant impact, including being part of the team that helped build the storage engine used in the MongoDB Mobile product, which was released at MongoDB World 2018.

Andrea Dooley: How did you first learn about MongoDB?
Haley Connelly: I actually interviewed with MongoDB before I really knew what the company did. That year, instead of taking a summer internship I did a 6 month co-op, which is essentially a full time role, in the spring. My mentor had great things to say about the MongoDB Query Language (MQL), and even wrote a wrapper around an existing ORM to mimic MongoDB-like query capabilities. After I learned how useful MongoDB could be, and once I figured out I wanted to be a Systems Engineer, I re-interviewed with the company, which was really exciting.

AD: Can you provide insight to your first summer at MongoDB?
HC: I worked on the Storage Team, and helped engineer the mobile storage engine. I was part of the team took the design and architecture from the ground up. It was really exciting because MongoDB Mobile was announced at MongoDB World 2018, and the storage engine is now in beta. We tried to build as much functionality as possible during that summer, and it was great to see a lot of it made it into the final code.

AD: That’s a huge accomplishment! Did you know that was going to happen?
HC: At the end of the summer, all interns are required to give a presentation on their work from the summer. My partner and I received a lot of positive feedback from our presentation. We got the storage engine functionally working as a prototype, and it worked so well that another intern was actually able to use it. When i came back this summer my old mentor let me know that MongoDB Mobile was about to be released to beta testers.

AD: What are you working on this summer?
HC: This summer I’m working on the Query team, and am using Haskell, which is a functional programming language to help model the semantics of the MongoDB Query Language. The model is intended to help further optimize behavior for future versions of MQL.

AD: What made you want to come back to MongoDB for another internship?
HC: I’ll admit I was always against doing two internships at the same company. I thought it was best to try things out in order to know what you like and what you don’t like. But I figured out what I wanted in a company, and I found that in MongoDB.

AD: What in particular were you looking for in a company?
HC: MongoDB has the name recognition and is well respected in the developer community, alongside a great company culture. It’s also the perfect size company in that it’s not so large that your work seems like it doesn’t make a contribution. We are constantly promoting people to take risks and be intellectually honest, and that really shines when working with people day to day.

AD: What about the internship program itself?
HC: The fact that we get to choose what we want to work on is so awesome to me. I knew there were other teams I wanted to try out, and I knew wherever I wound up I would be able to make a genuine contribution.

AD: It sounds like you’ve gotten a lot out of your internship here. What advice would you give to someone who might be interested in interning at MongoDB?
HC: You get to work on things that matter. I’m very interested in lower level, technical work, which is hard to find as an intern. But MongoDB has provided me with the opportunity to do some real problem solving and figure things out strategically. Here you get to figure out the problems and work towards a solution, and in my opinion that’s really valuable.

To learn more about the MongoDB internship program, click here.

New to MongoDB Atlas — Global Clusters Enable Low-Latency Reads and Writes from Anywhere

The ability to replicate data across any number of cloud regions was introduced to MongoDB Atlas, the fully managed service for the database, last fall. This granted Atlas customers two key benefits. For those with geographically distributed applications, this functionality allowed them to leverage local replicas of their data to reduce read latency and provide a fast, responsive customer experience on a global scale. It also meant that an Atlas cluster could be easily configured to failover to another region during cloud infrastructure outages, providing customers with the ability to provision multi-region fault tolerance in just a few clicks.

But what about improving write latency and addressing increasingly demanding regulations, many of which have data residency requirements? In the past, users could address these challenges in a couple of ways. If they wanted to continue using a fully managed MongoDB service, they could deploy separate databases in each region. Unfortunately, this often resulted in added operational and application complexity. They could also build and manage a geographically distributed database deployment themselves and satisfy these requirements using MongoDB’s zone sharding capabilities.

Today we’re excited to introduce Global Clusters to MongoDB Atlas. This new feature makes it possible for anyone to effortlessly deploy and manage a single database that addresses all the aforementioned requirements. Global Clusters allow organizations with distributed applications to geographically partition a fully managed deployment in a few clicks, and control the distribution and placement of their data with sophisticated policies that can be easily generated and changed.

Improving app performance by reducing read and write latency

With Global Clusters, geographically distributed applications can write to (and of course, read from) local partitions of an Atlas deployment called zones. This new Global Writes capability allows you to associate and direct data to a specific zone, keeping it in close proximity to nearby application instances and end users. In its simplest configuration, an Atlas zone contains a 3-node replica set distributed across the availability zones of its preferred cloud region. This configuration can be adjusted depending on your requirements. For example, you can turn the 3-node replica set into multiple shards to address increases in local write throughput. You can also distribute the secondaries within a zone into other cloud regions to enable fast, responsive read access to that data from anywhere.

The illustration above represents a simple Global Cluster in Atlas with two zones. For simplicity’s sake, we’ve labeled them blue and red. The blue zone uses a cloud region in Virginia as the preferred region, while the red zone uses one in London. Local application instances will write to and read from the MongoDB primaries located in the respective cloud regions, ensuring low latency read and write access. Each zone also features a read-only replica of its data located in the cloud region of the other one. This ensures that users in North America will have fast, responsive read access to data generated in Europe, and vice versa.

Satisfying data residency for regulatory requirements

By allowing developers to easily direct the movement of data at the document level, Global Clusters provide a foundational building block that helps organizations achieve compliance with regulations containing data residency requirements. Data is associated with a zone and pinned to that zone unless otherwise configured.

The illustration below represents an Atlas Global Cluster with 3 zones — blue, red, and orange. The configuration of the blue and red zones are very similar to what we already covered. Local application instances read and write to nearby primaries located in the preferred regions — Virginia and London — and each zone includes a read-only replica in the preferred cloud region of every other zone for serving fast, global reads. What’s different is the orange zone, which serves Germany. Unlike data generated in North America and the UK, data generated in and around Germany is not replicated globally; instead, it remains pinned to the preferred cloud region located in Frankfurt.

Deploying your first Global Cluster

Now let’s walk through how easy it is to set up a Global Cluster with MongoDB Atlas.

In the Atlas UI, when you go to create a cluster, you’ll notice a new accordion labelled Global Cluster Configuration. If you click into this and enable “Global Writes”, you’ll find two easy-to-use and customizable templates. Global Performance provides reasonable read and write latency to the majority of the global population and Excellent Global Performance provides excellent read and write latency to majority of the global population. Both options are available across AWS, Google Cloud Platform, and Microsoft Azure.

You can also configure your own zones. Let’s walk through the setup of a Global Cluster using the Global Performance template on AWS. After selecting the Global Performance template, you’ll see that the Americas are mapped to the North Virginia region, EMEA is mapped to Frankfurt, and APAC is mapped to Singapore.

As your business requirements change over time, you are able to switch to the Excellent Global Performance template or fully customize your existing template.

Customizing your Global Cluster

Say you wanted to move your EMEA zone from Frankfurt to London. You can do so in just a few clicks. If you scroll down in the Create Cluster Dialog, you’ll see the Zone configuration component (pictured below). Select the zone you want to edit and simply update the preferred cloud region.

Once you’re happy with the configuration, you can verify your changes in the latency map and then proceed to deploy the cluster.

After your Global Cluster has been deployed, you’ll find that it looks just like any other Atlas cluster. If you click into the connect experience to find your connection string, you’ll find a simple and concise connection string that you can use in all of your geographically distributed application instances.

Configuring data for a Global Cluster

Now that your Global Cluster is deployed, let's have a look at the Atlas Data Explorer, where you can create a new database and collection. Atlas will walk you through this process, including the creation of an appropriate compound shard key — the mechanism used to determine how documents are mapped to different zones.

This shard key must contain the location field. The second field should be a well-distributed identifier, such as userId. Full details on key selection can be found in the MongoDB Atlas docs.

To help show what documents might look like in your database, we’ve added a few sample documents to a collection in the Data Explorer. As you can see above, we’ve included a field called location containing a ISO-3166-1 alpha 2 country code ("US", "DE", "IN") or a supported ISO-3166-2 subdivision code ("US-DC", "DE-BE", "IN-DL"), as well as a field called userId, which acts as our well-distributed identifier. This ensures that location affinity is baked into each document.

In the background, MongoDB Atlas will have automatically placed each of these documents in their respective zones. The document corresponding to Anna Bell will live in North Virginia and the document corresponding to John Doe will live in Singapore. Assuming we have application instances deployed in Singapore and North Virginia, both will use the same MongoDB connection string to connect to the cluster. When Anna Bell connects to our application from the US, she will automatically be working with data kept in close proximity to her. Similarly, when John Doe connects from Australia, he will be writing to the Singapore region.

Adding a zone to your Global Cluster

Now let’s say that you start to see massive adoption of your application in India and you want to improve the performance for local users. At anytime, you can return to your cluster configuration, click “Add a Zone”, and select Mumbai as the preferred cloud region for the new zone.

The global latency map will update, showing us the new zone and an updated view of the countries that map to it. When we deploy the changes, the documents that are tagged with relevant ISO country codes will gracefully be transferred across to the new zone, without downtime.

Scaling write throughput in a single zone

As we mentioned earlier in this post, it’s possible to scale out a single zone to address increases in local write throughput. Simply scroll to the “Zone Configuration”, click on “Additional Options” and increase the number of shards. By adding a second shard to a zone, you are able to double your write throughput.

Low-latency reads of data originating from other zones

We also referenced the ability to distribute read-only replicas of data from a zone into the preferred cloud regions of other zones, providing users with low-latency read access to data originating from other regions. This is easy to configure in MongoDB Atlas. In “Zone Configuration”, select “Add secondary and read-only regions”. Under “Deploy read-only replicas”, select “Add a node” and choose the region where you’d like your read-only replica to live.

For global clusters, Atlas provides a shortcut to creating read-only replicas of each zone in every other zone. Under “Zone configuration summary”, simply select the “Configure local reads in every zone” button.

MongoDB Atlas Global Clusters are very powerful, making it possible for practically any developer or organization to easily deploy, manage, and scale a distributed database layer optimized for low-latency reads and writes anywhere in the world. We're very excited to see what you build with this new functionality.

Global clusters are available today on Amazon Web Services, Google Cloud Platform, and Microsoft Azure for clusters M30 and larger.

Through the Looking Glass: Analyzing the interplay between memory, disk, and read performance.


Understanding the relationships between various internal caches and disk performance, and how those relationships affect database and application performance, can be challenging. We’ve used the YCSB benchmark, varying the working set (number of documents used for the test) and disk performance, to better show how these relate. While reviewing the results, we’ll cover some MongoDB internals to improve understanding of common database usage patterns.

Key Takeaways

  1. Knowing disk baseline performance is important for understanding overall database performance.
  2. High disk await and utilization are indicative of a disk bottleneck.
  3. WiredTiger IO is random.
  4. A query targeting a single replica set is single threaded and sequential.
  5. Disk performance and working set size are closely related.


The primary contributors to overall system performance are how the working set relates to both the storage engine cache size (the memory dedicated for storing data) and disk performance (which provides a physical limit to how quickly data can be accessed).

Using YCSB, we explore the interactions between disk performance and cache size, demonstrating how these two factors can affect performance. While YCSB was used for this testing, synthetic benchmarks are not representative of production workloads. Latency and throughput numbers obtained with these methods do not map to production performance. We utilized MongoDB 3.4.10, YCSB 0.14, and the MongoDB 3.6.0 driver for these tests. YCSB was configured with 16 threads, and the “uniform” read only workload.

We show that fitting your working set inside memory provides for optimal application performance and as with any database, exceeding this limit negatively affects latency and overall throughput.

Understanding Disk Metrics

There are four important metrics when considering disk performance:

  1. Disk throughput, or number of requests multiplied by the request size. This is usually measured in megabytes per second. Random read and write performance in the 4kb range is the most representative of standard database workloads. Note that many cloud providers limit the disk throughput or bandwidth.
  2. Disk latency. On Linux this is represented by `await`, the time in milliseconds from an application issuing a read or write before the data is written or returned to the application. For SSDs, latencies are typically under 3ms. HDDs are typically above 7ms. High latencies indicate disks have trouble keeping up with the given workload.
  3. Disk IOPS (Input/Output Operations Per Second). `iostat` reports this metric as `tps`. A given cloud provider may guarantee a certain number of IOPS for a given drive. Should you reach this threshold, any further accesses will be queued, resulting in a disk bottleneck. A high end PCIe attached NVMe device could offer 1,500,000 IOPS while a typical hard disk may only support 150 IOPS.
  4. Disk utilization. Reported by `util` in `iostat`. Linux has multiple `queues` per device for servicing IO. Utilization indicates what percentage of these queues is busy at a given time. While this number can be confusing, it is a good indicator of overall disk health.

Testing Disk Performance

While cloud providers may provide an IOPS threshold for a given volume and disk, and disk manufacturers publish expected performance numbers, the actual results on your system may vary. If the observed disk performance is in question, performing an IO test can be very helpful.

We generally test with fio, the Flexible IO Tester. We performed tests on 10GB of data, the ioengine of psync, and with reads ranging between 4kb and 32kb. While the default fio settings are not representative of the WiredTiger workload, we have found this configuration to be a good approximation of WiredTiger disk utilization.

All tests were repeated under three disk scenarios:

Scenario 1

Default disk settings provided by a AWS c5 io1 100GB volume. 5000 IOPS

  • 1144 IOPS / 5025 physical reads per second / 99.85% util
Scenario 2

Limiting the disk to 600 IOPS and introducing 7ms of latency. This should mirror the performance of a typical RAID10 SAN with hard drives

  • 134 IOPS / 150 physical reads per second / 95.72% util
Scenario 3

Further limiting the disk to 150 IOPS with 7ms latency. This should model a commodity spinning hard drive.

  • 34 IOPS / 150 physical reads per second / 98.2% utilization

How is a query serviced from disk?

The WiredTiger Storage Engine performs its own caching. By default, the WiredTiger cache is sized at 50% of system memory minus 1GB to allow adequate space for both other system processes, the filesystem cache, and internal MongoDB operations that consume additional memory such as building indexes, performing in memory sorts, deduplicating results, text scoring, connection handling, and aggregations. To prevent performance degradation from a totally full cache, WiredTiger automatically begins evicting data from the cache when the utilization grows above 80%. For our tests, this means the effective cache size is (7634MB - 1024MB) * .5 * .8, or 2644MB.

All queries are serviced from the WiredTiger cache. This means a query will cause indexes and documents to be read from disk through the filesystem cache into the WiredTiger cache before returning results. If the requested data is already in the cache, this step is skipped.

WiredTiger stores documents with the snappy compression algorithm by default. Any data read from the file system cache is first decompressed before storing in the WiredTiger cache. Indexes utilize prefix compression by default and are compressed both on disk and inside the WiredTiger cache.

The filesystem cache is an Operating System construct to store frequently accessed files in memory to facilitate faster accesses. Linux is very aggressive in caching files and will attempt to consume all free memory with the filesystem cache. If additional memory is needed, the filesystem cache is evicted to allow more memory for applications.

Here is an animated graphic, showing the disk accesses for the YCSB collection resulting from 100 YCSB read operations. Each operation is an individual find for providing the _id for a single document.

The upper left hand corner represents the first byte in the WiredTiger collection file. Disk locations increment to the right hand side and wrap around. Each row represents a 3.5MB segment of the WiredTiger collection file. The accesses are ordered by time and represented by the frame of animation. Accesses are represented in red and green boxes to highlight the current disk access.

3.5 MB vs 4KB

Here we see the data file for our collection read into memory. Because the data is stored in B+ trees, we may need to find the disk location of our document (the smaller accesses) by visiting one or more locations on disk before our document is found and read (the wider accesses).

This demonstrates the typical access patterns of a MongoDB query – documents are unlikely to be close to each other on disk. This also shows it is highly unlikely for documents, even when inserted after each other, to be in consecutive disk locations.

The WiredTiger storage engine is designed to “read completely”: it will issue a read for all of the data it needs at once. This leads to our recommendation to limit the disk read ahead for WiredTiger deployments to zero, as subsequent accesses are unlikely to take advantage of the additional data retrieved through read ahead.

Working Set Fits in Cache

For our first set of tests, we set the record count to 2 million, resulting in a total size for both data and indexes of 2.43 GB or 92% of cache.

Here we see strong scenario 1 performance of 76,113 requests per second. Checking the filesystem cache statistics, we observe a WiredTiger cache hit rate of 100% with no accesses and zero bytes read into the filesystem cache, meaning no additional IO is required throughout this test.

Unsurprisingly, in scenarios 2 and 3, changing the disk performance (adding 7ms of latency and limiting iops to either 600 or 150) affected throughput minimally (69,579.5 and 70,252 Operations per second respectively).

Our 99% response latencies for all three tests are between 0.40 and 0.44 ms.

Working Set Larger than WiredTiger Cache, but Still Fits in Filesystem Cache

Modern operating systems cache frequently accessed files to improve read performance. Because the file is already in memory, accessing cached files does not result in physical reads. The cached statistics displayed by the free Linux command details the size of the filesystem cache.

When we increase our record count from 2 million to 3 million we increase our total size of data and indexes to 3.66GB, 38% greater than can be serviced solely from the WiredTiger cache.

The metrics are clear that we are reading an average of 548 mbps into the WiredTiger cache, but we can observe a 99.9% hit rate when checking the file system cache metrics.

For this test we begin to see a reduction in performance, performing only 66,720 operations per second compared to our baseline, representing an 8% reduction compared to our previous test serviced solely from the WiredTiger cache.

As expected, reduced disk performance for this case does not significantly affect our overall throughput (64,484 and 64,229 operations respectively). In cases where the documents are more compressible, or the CPU is a limiting factor, the penalty reading from the filesystem cache would be more pronounced.

We note a 54% increase in observed p99 latency to .53 - .55ms.

Working Set Slightly Larger Than WiredTiger and FileSystem Cache

We have established the WiredTiger and file system caches work together to provide data to service our queries. However, when we grow our record count from 3 million to 4 million, we can no longer solely utilize these caches to service queries. Our data size grows to 4.8GB or 82% larger then our WiredTiger cache.

Here, we read into the WiredTiger cache at a rate of 257.4 mbps. Our filesystem cache hit rate lowers to 93-96%, meaning 4-7% of our reads result in physical reads from disk.

Varying the available IOPS and disk latency has a huge impact on performance for this test.

The 99th percentile response latencies further increase. Scenario 1: 19ms, scenario 2: 171ms, and scenario 3: 770ms an increase of 43x, 389x, and 1751x from the in cache case.

We see 75% lower performance when MongoDB is provided the full 5000 iops compared to our earlier test, which fit fully in cache. Scenarios 2 and Scenario 3 achieved 5139.5 and 737.95 Operations per second respectively, further demonstrating the IO bottleneck.

Working Set Much Larger Than WiredTiger and FileSystem Cache

Moving up to 5 million records, we grow our data and index size to 6.09GB, larger than our combined WiredTiger and file system caches. We see our throughput dip below our IOPS. In this case we are still servicing 81% of of WiredTiger reads from the file system cache, but the reads overflowing from disk are saturating our IO. We see 71, 8.3, and 1.9 Mbps read into the filesystem cache for this test.

The 99th percentile response latencies further increase. Scenario 1: 22ms, Scenario 2: 199ms, and Senario 3: 810ms, an increase of 52x, 454x, and 1841x from the in cache response latencies. Here, changing the disk IOPS significantly affects our throughput.


Through this series of tests we demonstrate two major points.

  1. If the working set fits in cache, disk performance does not greatly affect application performance.
  2. When the working set exceeds available memory, disk performance quickly becomes the limiting factor for throughput.

Understanding how MongoDB utilizes both memory and disks is an important part for both sizing a deployment and understanding performance. The inner workings of the WiredTiger storage engine attempts to use hardware to the fullest extent, but memory and disk are two critical pieces of infrastructure contributing to the overall performance characteristics of your workload.

Introducing the Best Database for Modern Applications

The announcements we made today at MongoDB World 2018 represent a significant milestone in the evolution of MongoDB, making it the database of choice for all modern applications. Broadly speaking, there are three reasons for this:

  1. The document data model – presenting you the best way to work with data.
  2. It’s distributed by design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – allowing you to future-proof your work and eliminate vendor lock-in.

There is a ton of new stuff, and so I wanted to give you a summary of what I covered during my keynote, with links to key resources so you can learn more.

Best Way to Work with Data

Today we released MongoDB Server 4.0 for General Availability. The highlight of the release is multi-document ACID transactions, which we previewed back in February with a beta program that attracted thousands of members of the community, putting transactions through their paces and providing invaluable feedback to the engineering team. We’ve implemented transactions so they feel just like the transactions you are familiar with from relational databases. They enforce snapshot isolation to provide a consistent view of data, and all-or-nothing execution to maintain data integrity. And while the document model means multi-document transactions aren’t necessary for most operations, with them it’s even easier for you to address a complete range of use cases with MongoDB.

It’s no secret how much I love MongoDB’s aggregation framework. Building queries stage-by-stage, checking your output as you go is by far a better way to write your most complicated queries than dealing with a monolithic snarl of SQL. To make that workflow even better, we’ve enhanced MongoDB Compass with the aggregation pipeline builder, which provides stage-by-stage, real-time feedback on the documents flowing through your pipelines. It’s easier than ever to deploy sophisticated processing pipelines that transform, aggregate, and analyze your data, all from the simple and intuitive MongoDB Compass GUI. You can then export the pipelines, and any other queries you create in Compass, to the native code of your preferred programming language. Server 4.0 also adds type conversions to the aggregation pipeline. With the new $convert operator you can transform mixed data types into standardized, cleansed formats natively within the database, preparing it for BI and machine learning, while eliminating costly, slow, and fragile ETL processes.

Extending the tools you can use to work with data managed by the server, we announced the public beta of MongoDB Charts, which provides the fastest and easiest way to get insights into your operational data, in real time. With Charts, you can create and share visualisations of your MongoDB data, using a document-native interface, without needing to move it into other systems or leverage third-party tools.

Documents and MongoDB’s query language are the best way to work with data, and to bring that power out of the datacenter and into the hands of app developers, MongoDB Stitch, which is GA as of today, provides two of its four services: QueryAnywhere and Functions. Using the authentication and declarative access control rules of Stitch QueryAnywhere, we can end the horrid practice of implementing shadow query languages in REST on top of application servers that just turn those REST calls into real query languages. With a native SDK, developers can make use of the full power of MongoDB from mobile and JavaScript applications, while Stitch makes sure that the right permissions are observed. Stitch Functions, JavaScript functions that execute with full access to application context, let developers compose their business logic with access to Atlas and calls to external services. With these two services, it’s easy to build complete applications without standing up a single application server.

Intelligently Put Data Where you Want It

As a distributed system, MongoDB enables you to spread data out across a cluster of nodes for resilience, scalability, and workload isolation. Unlike other distributed databases that randomly spray data around a cluster, MongoDB allows you to define controls that place data on specific nodes, for example in a specific region for low latency reads and writes, and for compliance with new privacy regulations.

The new Global Clusters introduced to MongoDB Atlas allow you to deploy a geographically distributed, fully managed database that provides low latency writes and reads to users anywhere, with data placement controls for regulatory compliance. We also announced Atlas Enterprise, offering new security controls including LDAP integration, the encrypted storage engine with bring-your-own key management, and database-level auditing. Organizations can now also use databases managed by MongoDB Atlas to build HIPAA-compliant applications under an executed Business Associate Agreement (BAA) with MongoDB, Inc. With these now announcements, MongoDB Atlas is the most secure cloud database service available anywhere.

Coding in a distributed world also means that the traditional means of responding to events in a database are no longer viable. So Stitch Triggers, also GA today, makes it possible by building on the Change Streams introduced in MongoDB 3.6. When you create a trigger, Stitch manages a change stream on your behalf, providing real-time notifications to Stitch Functions, which can react in all the ways functions can, from updating analytics rollup collections, to sending email or text messages, or kicking off other external services like Kafka or Kinesis.

Freedom to Run Anywhere

Whether you want to consume your database as a service in MongoDB Atlas, or manage it yourself on your own infrastructure, the announcements today make that even easier. We deliver a data platform that runs the same everywhere, that leverages the benefits of multi-cloud strategy with no lock-in, and is available in 50+ regions across the major cloud providers.

If you want to run MongoDB yourself, then we have released our new free MongoDB monitoring cloud service. The service is available to all MongoDB users, without needing to install an agent, navigate a paywall, or complete a registration form. You will be able to see the metrics and topology about your environment from the moment free monitoring is enabled. You can enable free monitoring easily using the MongoDB shell, MongoDB Compass, or by starting the mongod process with the new db.enableFreeMonitoring() command line option, and you can opt out at any time.

We’re seeing more DevOps teams leveraging the power of containerization and technologies like Kubernetes and Red Hat OpenShift to manage containerized clusters. Today we announced beta of the new MongoDB Enterprise Operator for Kubernetes, enabling you to deploy and manage MongoDB clusters from within the Kubernetes API, without having to connect separately to Ops Manager. You can learn more by reading our Red Hat OpenShift and MongoDB blog, and checking out the repository on GitHub.

Announced today, MongoDB Mobile takes MongoDB to a new frontier. Available in beta, MongoDB Mobile extends your ability to put data where you need it, all the way out to the edge of the network on IoT assets and iOS and Android mobile devices. MongoDB Mobile provides a single database, query language, and the intuitive Stitch SDK that runs consistently for data held on mobile clients, through to the backend server.

MongoDB Mobile provides the power and flexibility of MongoDB in a compact form that is power- and performance-aware with a low disk and memory footprint. It supports 64 bit iOS and Android operating systems and is easily embedded into mobile and IoT devices for fast and reliable local storage of JSON documents. With secondary indexing, access to the full MongoDB query language and aggregations, users can query data any way they want. With local reads and writes, MongoDB Mobile lets you build the fastest, most reactive apps. And Stitch Mobile Sync, which is in private beta now, will automatically synchronize data changes between data held locally and your backend database, helping resolve any conflicts – even after the device has been offline. The beta program is open now, and you can sign up for access on the MongoDB Mobile product page.

What’s Next?

So as you can see, that’s a ton of stuff. The announcements today represent our biggest set of releases yet, and we’re incredibly excited to get it into your hands and see what amazing things you do with them. Head over to our MongoDB World 2018 announcements page for more resources on each of these new products and services.

MongoDB Multi-Document ACID Transactions are GA

With the release of 4.0, you now have multi-document ACID transactions in MongoDB.

But how have you been able to be so productive with MongoDB up until now? The database wasn’t built overnight and many applications have been running mission critical use cases demanding the highest levels of data integrity without multi-document transactions for years. How was that possible?

Well, let’s think first about why many people believe they need multi-document transactions. The first principle of relational data modeling is normalizing your data across tables. This means that many common database operations, like account creation, require atomic updates across many rows and columns.

In MongoDB, the data model is fundamentally different. The document model encourages users to store related data together in a single document. MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don’t need ACID guarantees across multiple documents.

However, transactions are not just a check box. Transactions, like every MongoDB feature, aim to make developers lives easier. ACID guarantees across documents simplify application logic needed to satisfy complex applications.

The value of MongoDB’s transactions

While MongoDB’s existing atomic single-document operations already provide transaction semantics that satisfy the majority of applications, the addition of multi-document ACID transactions makes it easier than ever for developers to address the full spectrum of use cases with MongoDB. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. For developers with a history of transactions in relational databases, MongoDB’s multi-document transactions are very familiar, making it straightforward to add them to any application that requires them.

In MongoDB 4.0, transactions work across a replica set, and MongoDB 4.2 will extend support to transactions across a sharded deployment* (see the Safe Harbor statement at the end of this blog). Our path to transactions represents a multi-year engineering effort, beginning back in early 2015 with the groundwork laid in almost every part of the server and the drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

In this blog, we will explore why MongoDB had added multi-document ACID transactions, their design goals and implementation, characteristics of transactions and best practices for developers. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

Why Multi-Document ACID Transactions

Since its first release in 2009, MongoDB has continuously innovated around a new approach to database design, freeing developers from the constraints of legacy relational databases. A design founded on rich, natural, and flexible documents accessed by idiomatic programming language APIs, enabling developers to build apps 3-5x faster. And a distributed systems architecture to handle more data, place it where users need it, and maintain always-on availability. This approach has enabled developers to create powerful and sophisticated applications in all industries, across a tremendously wide range of use cases.

Figure 1: Organizations innovating with MongoDB

With subdocuments and arrays, documents allow related data to be modeled in a single, rich and natural data structure, rather than spread across separate, related tables composed of flat rows and columns. As a result, MongoDB’s existing single document atomicity guarantees can meet the data integrity needs of most applications. In fact, when leveraging the richness and power of the document model, we estimate 80%-90% of applications don’t need multi-document transactions at all.

However, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-document transactions are a requirement for any database, irrespective of the data model they are built upon. Some are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future. And for some workloads, support for ACID transactions across multiple documents is required.

As a result, the addition of multi-document transactions makes it easier than ever for developers to address a complete range of use cases on MongoDB. For some, simply knowing that they are available will assure them that they can evolve their application as needed, and the database will support them.

Data Models and Transactions

Before looking at multi-document transactions in MongoDB, we want to explore why the data model used by a database impacts the scope of a transaction.

Relational Data Model

Relational databases model an entity’s data across multiple records and parent-child tables, and so a transaction needs to be scoped to span those records and tables. The example in Figure 2 shows a contact in our customer database, modeled in a relational schema. Data is normalized across multiple tables: customer, address, city, country, phone number, topics and associated interests.

Figure 2: Customer data modeled across separate tables in a relational database

In the event of the customer data changing in any way, for example if our contact moves to a new job, then multiple tables will need to be updated in an “all-or-nothing” transaction that has to touch multiple tables, as illustrated in Figure 3.

Figure 3: Updating customer data in a relational database

Document Data Model

Document databases are different. Rather than break related data apart and spread it across multiple parent-child tables, documents can store related data together in a rich, typed, hierarchical structure, including subdocuments and arrays, as illustrated in Figure 4.

Figure 4: Customer data modeled in a single, rich document structure

MongoDB provides existing transactional properties scoped to the level of a document. As shown in Figure 5, one or more fields may be written in a single operation, with updates to multiple subdocuments and elements of any array, including nested arrays. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. With this design, application owners get the same data integrity guarantees as those provided by older relational databases.

Figure 5: Updating customer data in a document database

Where are Multi-Document Transactions Useful

There are use cases where transactional ACID guarantees need to be applied to a set of operations that span multiple documents. Back office “System of Record” or “Line of Business” (LoB) applications are the typical class of workload where multi-document transactions are useful. Examples include:

  • Processing application events when users perform important actions -- for instance when updating the status of an account, say to delinquent, across all those users’ documents.
  • Logging custom application actions -- say when a user transfers ownership of an entity, the write should not be successful if the logging isn’t.
  • Many to many relationships where the data naturally fits into defined objects -- for example positions, calculated by an aggregate of hundreds of thousands of trades, need to be updated every time trades are added or modified.

MongoDB already serves these use cases today, but the introduction of multi-document transactions makes it easier as the database automatically handles multi-document transactions for you. Before their availability, the developer would need to programmatically implement transaction controls in their application. To ensure data integrity, they would need to ensure that all stages of the operation succeed before committing updates to the database, and if not, roll back any changes. This adds complexity that slows down the rate of application development. MongoDB customers in the financial services industry have reported they were able to cut 1,000+ lines of code from their apps by using multi-document transactions.

In addition, implementing client side transactions can impose performance overhead on the application. For example, after moving from its existing client-side transactional logic to multi-document transactions, a global enterprise data management and integration ISV experienced improved MongoDB performance in its Master Data Management solution: throughput increased by 90%, and latency was reduced by over 60% for transactions that performed six updates across two collections.

Multi-Document ACID Transactions in MongoDB

Transactions in MongoDB feel just like transactions developers are used to in relational databases. They are multi-statement, with similar syntax, making them familiar to anyone with prior transaction experience.

The following Python code snippet shows a sample of the transactions API.

    with client.start_session() as s:
            collection_one.insert_one(doc_one, session=s)
            collection_two.insert_one(doc_two, session=s)

The following snippet shows the transactions API for Java.

    try (ClientSession clientSession = client.startSession()) {
                  collection.insertOne(clientSession, docOne);
                  collection.insertOne(clientSession, docTwo);

As these examples show, transactions use regular MongoDB query language syntax, and are implemented consistently whether the transaction is executed across documents and collections in a replica set, and with MongoDB 4.2, across a sharded cluster*.

We're excited to see MongoDB offer dedicated support for ACID transactions in their data platform and that our collaboration is manifest in the Lovelace release of Spring Data MongoDB. It ships with the well known Spring annotation-driven, synchronous transaction support using the MongoTransactionManager but also bits for reactive transactions built on top of MongoDB's ReactiveStreams driver and Project Reactor datatypes exposed via the ReactiveMongoTemplate.

Pieter Humphrey - Spring Product Marketing Lead, Pivotal

The transaction block code snippets below compare the MongoDB syntax with that used by MySQL. It shows how multi-document transactions feel familiar to anyone who has used traditional relational databases in the past.


        cursor.execute(orderInsert, orderData)
        cursor.execute(stockUpdate, stockData)


        orders.insert_one(order, session=s)
        stock.update_one(item, stockUpdate, session=s)

Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions can apply to operations against multiple documents contained in one, or in many, collections and databases. The changes to MongoDB that enable multi-document transactions do not impact performance for workloads that don't require them.

During its execution, a transaction is able to read its own uncommitted writes, but none of its uncommitted writes will be seen by other operations outside of the transaction. Uncommitted writes are not replicated to secondary nodes until the transaction is committed to the database. Once the transaction has been committed, it is replicated and applied atomically to all secondary replicas.

An application specifies write concern in the transaction options to state how many nodes should commit the changes before the server acknowledges the success to the client. All uncommitted writes live on the primary exclusively.

Taking advantage of the transactions infrastructure introduced in MongoDB 4.0, the new snapshot read concern ensures queries and aggregations executed within a read-only transaction will operate against a single, isolated snapshot on the primary replica. As a result, a consistent view of the data is returned to the client, irrespective of whether that data is being simultaneously modified by concurrent operations. Snapshot reads are especially useful for operations that return data in batches with the getMore command.

Even before MongoDB 4.0, typical MongoDB queries leveraged WiredTiger snapshots. The distinction between typical MongoDB queries and snapshot reads in transactions is that snapshot reads use the same snapshot throughout the duration of the query. Whereas typical MongoDB queries may switch to a more current snapshot during yield points.

Snapshot Isolation and Write Conflicts

When modifying a document, a transaction locks the document from additional changes until the transaction completes. If a transaction is unable to obtain a lock on a document it is attempting to modify, likely because another transaction is already holding the lock, it will immediately abort after 5ms with a write conflict.

However, if a typical MongoDB write outside of a multi-document transaction tries to modify a document that is currently being held by a multi-document transaction, that write will block behind the transaction completing. The typical MongoDB write will be infinitely retried with backoff logic until $maxTimeMS is reached. Even prior to 4.0, all wirtes were represented as WiredTiger transactions at the storage layer and it was possible to get write conflicts. This same logic was implemented to avoid users having to react to write conflicts client-side.

Reads do not require the same locks that document modifications do. Documents that have uncommitted writes by a transaction can still be read by other operations, and of course those operations will only see the committed values, and not the uncommitted state.

Only writes trigger write conflicts within MongoDB. Reads, in line with the expected behavior of snapshot isolation, do not prevent a document from being modified by other operations. Changes will not be surfaced as write conflicts unless a write is performed on the document. Additionally, no-op updates – like setting a field to the same value that it already had – can be optimized away before reaching the storage engine, thus not triggering a write conflict. To guarantee that a write conflict is detected, perform an operation like incrementing a counter.

Retrying Transactions

MongoDB 4.0 introduces the concept of error labels. The transient transaction error label notifies listening applications that the error surfaced, ranging from network errors to write conflicts, that the error may be temporary, and that the transaction is safe to retry from the beginning. Permanent errors, like parsing errors, will not have the transient transaction error label, as rerunning the transaction will never result in a successful commit.

One of the core values of MongoDB is its highly available architecture. These error labels make it easy for applications to be resilient in cases of network blips or node failures, enabling cross document transactional guarantees without sacrificing use cases that must be always-on.

Transactions Best Practices

As noted earlier, MongoDB’s single document atomicity guarantees will meet 80-90% of an application’s transactional needs. They remain the recommended way of enforcing your app’s data integrity requirements. For those operations that do require multi-document transactions, there are several best practices that developers should observe.

Creating long running transactions, or attempting to perform an excessive number of operations in a single ACID transaction can result in high pressure on WiredTiger’s cache. This is because the cache must maintain state for all subsequent writes since the oldest snapshot was created. As a transaction uses the same snapshot while it is running, new writes accumulate in the cache throughout the duration of the transaction. These writes cannot be flushed until transactions currently running on old snapshots commit or abort, at which time the transactions release their locks and WiredTiger can evict the snapshot. To maintain predictable levels of database performance, developers should therefore consider the following:

  1. By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time. To address timeouts, the transaction should be broken into smaller parts that allow execution within the configured time limit. You should also ensure your query patterns are properly optimized with the appropriate index coverage to allow fast data access within the transaction.
  2. There are no hard limits to the number of documents that can be read within a transaction. As a best practice, no more than 1,000 documents should be modified within a transaction. For operations that need to modify more than 1,000 documents, developers should break the transaction into separate parts that process documents in batches.
  3. In MongoDB 4.0, a transaction is represented in a single oplog entry, therefore must be within the 16MB document size limit. While an update operation only stores the deltas of the update (i.e., what has changed), an insert will store the entire document. As a result, the combination of oplog descriptions for all statements in the transaction must be less than 16MB. If this limit is exceeded, the transaction will be aborted and fully rolled back. The transaction should therefore be decomposed into a smaller set of operations that can be represented in 16MB or less.
  4. When a transaction aborts, an exception is returned to the driver and the transaction is fully rolled back. Developers should add application logic that can catch and retry a transaction that aborts due to temporary exceptions, such as a transient network failure or a primary replica election. With retryable writes, the MongoDB drivers will automatically retry the commit statement of the transaction.
  5. DDL operations, like creating an index or dropping a database, block behind active running transactions on the namespace. All transactions that try to newly access the namespace while DDL operations are pending, will not be able to obtain locks, aborting the new transactions.
  6. You can review all best practices in the MongoDB documentation for multi-document transactions.

    Transactions and Their Impact to Data Modeling in MongoDB

    Adding transactions does not make MongoDB a relational database – many developers have already experienced that the document model is superior to the relational one today.

    All best practices relating to MongoDB data modeling continue to apply when using features such as multi-document transactions, or fully expressive JOINs (via the $lookup aggregation pipeline stage). Where practical, all data relating to an entity should be stored in a single, rich document structure. Just moving data structured for relational tables into MongoDB will not allow users to take advantage of MongoDB’s natural, fast, and flexible document model, or its distributed systems architecture.

    The RDBMS to MongoDB Migration Guidedescribes the best practices for moving an application from a relational database to MongoDB.

    The Path to Transactions

    Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

    Figure 6 presents a timeline of the key engineering projects that have enabled multi-document transactions in MongoDB, with status shown as of June 2018. The key design goal underlying all of these projects is that their implementation does not sacrifice the key benefits of MongoDB – the power of the document model and the advantages of distributed systems, while imposing no performance impact to workloads that don’t require multi-document transactions.

    Figure 6: The path to transactions – multi-year engineering investment, delivered across multiple releases


    MongoDB has already established itself as the leading database for modern applications. The document data model is rich, natural, and flexible, with documents accessed by idiomatic drivers, enabling developers to build apps 3-5x faster. Its distributed systems architecture enables you to handle more data, place it where users need it, and maintain always-on availability. MongoDB’s existing atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. The addition of multi-document ACID transactions in MongoDB 4.0 makes it easier than ever for developers to address a complete range of us cases, while for many, simply knowing that they are available will provide critical peace of mind.

    Take a look at our multi-document transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

    If you want to learn more about everything that’s new in MongoDB 4.0, download our Guide.

    Transactional guarantees have been a critical feature for relational databases for decades, but have typically been absent from non-relational alternatives, which has meant that users have been forced to choose between transactions and the flexibility and versatility that non-relational databases offer. With its support for multi-document ACID transactions, MongoDB is built for customers that want to have their cake and eat it too.

    Stephen O’Grady, Principal Analyst, Redmonk

    *Safe Harbour Statement

    This blog contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

    In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.