GIANT Stories at MongoDB

MongoDB 4.0: Non-Blocking Secondary Reads

Mat Keep

Technical, MongoDB 4.0

Many MongoDB users scale read performance by distributing their query load across secondary replicas. With the MongoDB 4.0 release, reads are no longer blocked when oplog entries are applied. Here's how

MongoDB Multi-Document ACID Transactions are GA

With the release of 4.0, you now have multi-document ACID transactions in MongoDB.

But how have you been able to be so productive with MongoDB up until now? The database wasn’t built overnight and many applications have been running mission critical use cases demanding the highest levels of data integrity without multi-document transactions for years. How was that possible?

Well, let’s think first about why many people believe they need multi-document transactions. The first principle of relational data modeling is normalizing your data across tables. This means that many common database operations, like account creation, require atomic updates across many rows and columns.

In MongoDB, the data model is fundamentally different. The document model encourages users to store related data together in a single document. MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don’t need ACID guarantees across multiple documents.

However, transactions are not just a check box. Transactions, like every MongoDB feature, aim to make developers lives easier. ACID guarantees across documents simplify application logic needed to satisfy complex applications.

The value of MongoDB’s transactions

While MongoDB’s existing atomic single-document operations already provide transaction semantics that satisfy the majority of applications, the addition of multi-document ACID transactions makes it easier than ever for developers to address the full spectrum of use cases with MongoDB. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. For developers with a history of transactions in relational databases, MongoDB’s multi-document transactions are very familiar, making it straightforward to add them to any application that requires them.

In MongoDB 4.0, transactions work across a replica set, and MongoDB 4.2 will extend support to transactions across a sharded deployment* (see the Safe Harbor statement at the end of this blog). Our path to transactions represents a multi-year engineering effort, beginning back in early 2015 with the groundwork laid in almost every part of the server and the drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

In this blog, we will explore why MongoDB had added multi-document ACID transactions, their design goals and implementation, characteristics of transactions and best practices for developers. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

Why Multi-Document ACID Transactions

Since its first release in 2009, MongoDB has continuously innovated around a new approach to database design, freeing developers from the constraints of legacy relational databases. A design founded on rich, natural, and flexible documents accessed by idiomatic programming language APIs, enabling developers to build apps 3-5x faster. And a distributed systems architecture to handle more data, place it where users need it, and maintain always-on availability. This approach has enabled developers to create powerful and sophisticated applications in all industries, across a tremendously wide range of use cases.

Figure 1: Organizations innovating with MongoDB

With subdocuments and arrays, documents allow related data to be modeled in a single, rich and natural data structure, rather than spread across separate, related tables composed of flat rows and columns. As a result, MongoDB’s existing single document atomicity guarantees can meet the data integrity needs of most applications. In fact, when leveraging the richness and power of the document model, we estimate 80%-90% of applications don’t need multi-document transactions at all.

However, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-document transactions are a requirement for any database, irrespective of the data model they are built upon. Some are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future. And for some workloads, support for ACID transactions across multiple documents is required.

As a result, the addition of multi-document transactions makes it easier than ever for developers to address a complete range of use cases on MongoDB. For some, simply knowing that they are available will assure them that they can evolve their application as needed, and the database will support them.

Data Models and Transactions

Before looking at multi-document transactions in MongoDB, we want to explore why the data model used by a database impacts the scope of a transaction.

Relational Data Model

Relational databases model an entity’s data across multiple records and parent-child tables, and so a transaction needs to be scoped to span those records and tables. The example in Figure 2 shows a contact in our customer database, modeled in a relational schema. Data is normalized across multiple tables: customer, address, city, country, phone number, topics and associated interests.

Figure 2: Customer data modeled across separate tables in a relational database

In the event of the customer data changing in any way, for example if our contact moves to a new job, then multiple tables will need to be updated in an “all-or-nothing” transaction that has to touch multiple tables, as illustrated in Figure 3.

Figure 3: Updating customer data in a relational database

Document Data Model

Document databases are different. Rather than break related data apart and spread it across multiple parent-child tables, documents can store related data together in a rich, typed, hierarchical structure, including subdocuments and arrays, as illustrated in Figure 4.

Figure 4: Customer data modeled in a single, rich document structure

MongoDB provides existing transactional properties scoped to the level of a document. As shown in Figure 5, one or more fields may be written in a single operation, with updates to multiple subdocuments and elements of any array, including nested arrays. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. With this design, application owners get the same data integrity guarantees as those provided by older relational databases.

Figure 5: Updating customer data in a document database

Where are Multi-Document Transactions Useful

There are use cases where transactional ACID guarantees need to be applied to a set of operations that span multiple documents. Back office “System of Record” or “Line of Business” (LoB) applications are the typical class of workload where multi-document transactions are useful. Examples include:

  • Processing application events when users perform important actions -- for instance when updating the status of an account, say to delinquent, across all those users’ documents.
  • Logging custom application actions -- say when a user transfers ownership of an entity, the write should not be successful if the logging isn’t.
  • Many to many relationships where the data naturally fits into defined objects -- for example positions, calculated by an aggregate of hundreds of thousands of trades, need to be updated every time trades are added or modified.

MongoDB already serves these use cases today, but the introduction of multi-document transactions makes it easier as the database automatically handles multi-document transactions for you. Before their availability, the developer would need to programmatically implement transaction controls in their application. To ensure data integrity, they would need to ensure that all stages of the operation succeed before committing updates to the database, and if not, roll back any changes. This adds complexity that slows down the rate of application development. MongoDB customers in the financial services industry have reported they were able to cut 1,000+ lines of code from their apps by using multi-document transactions.

In addition, implementing client side transactions can impose performance overhead on the application. For example, after moving from its existing client-side transactional logic to multi-document transactions, a global enterprise data management and integration ISV experienced improved MongoDB performance in its Master Data Management solution: throughput increased by 90%, and latency was reduced by over 60% for transactions that performed six updates across two collections.

Multi-Document ACID Transactions in MongoDB

Transactions in MongoDB feel just like transactions developers are used to in relational databases. They are multi-statement, with similar syntax, making them familiar to anyone with prior transaction experience.

The following Python code snippet shows a sample of the transactions API.

    with client.start_session() as s:
            s.start_transaction()
            collection_one.insert_one(doc_one, session=s)
            collection_two.insert_one(doc_two, session=s)
            s.commit_transaction()

The following snippet shows the transactions API for Java.

    try (ClientSession clientSession = client.startSession()) {
                  clientSession.startTransaction();
                  collection.insertOne(clientSession, docOne);
                  collection.insertOne(clientSession, docTwo);
                  clientSession.commitTransaction();
        }

As these examples show, transactions use regular MongoDB query language syntax, and are implemented consistently whether the transaction is executed across documents and collections in a replica set, and with MongoDB 4.2, across a sharded cluster*.

We're excited to see MongoDB offer dedicated support for ACID transactions in their data platform and that our collaboration is manifest in the Lovelace release of Spring Data MongoDB. It ships with the well known Spring annotation-driven, synchronous transaction support using the MongoTransactionManager but also bits for reactive transactions built on top of MongoDB's ReactiveStreams driver and Project Reactor datatypes exposed via the ReactiveMongoTemplate.

Pieter Humphrey - Spring Product Marketing Lead, Pivotal

The transaction block code snippets below compare the MongoDB syntax with that used by MySQL. It shows how multi-document transactions feel familiar to anyone who has used traditional relational databases in the past.

MySQL

    db.start_transaction()
        cursor.execute(orderInsert, orderData)
        cursor.execute(stockUpdate, stockData)
    db.commit()

MongoDB

    s.start_transaction()
        orders.insert_one(order, session=s)
        stock.update_one(item, stockUpdate, session=s)
    s.commit_transaction()

Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions can apply to operations against multiple documents contained in one, or in many, collections and databases. The changes to MongoDB that enable multi-document transactions do not impact performance for workloads that don't require them.

During its execution, a transaction is able to read its own uncommitted writes, but none of its uncommitted writes will be seen by other operations outside of the transaction. Uncommitted writes are not replicated to secondary nodes until the transaction is committed to the database. Once the transaction has been committed, it is replicated and applied atomically to all secondary replicas.

An application specifies write concern in the transaction options to state how many nodes should commit the changes before the server acknowledges the success to the client. All uncommitted writes live on the primary exclusively.

Taking advantage of the transactions infrastructure introduced in MongoDB 4.0, the new snapshot read concern ensures queries and aggregations executed within a read-only transaction will operate against a single, isolated snapshot on the primary replica. As a result, a consistent view of the data is returned to the client, irrespective of whether that data is being simultaneously modified by concurrent operations. Snapshot reads are especially useful for operations that return data in batches with the getMore command.

Even before MongoDB 4.0, typical MongoDB queries leveraged WiredTiger snapshots. The distinction between typical MongoDB queries and snapshot reads in transactions is that snapshot reads use the same snapshot throughout the duration of the query. Whereas typical MongoDB queries may switch to a more current snapshot during yield points.

Snapshot Isolation and Write Conflicts

When modifying a document, a transaction locks the document from additional changes until the transaction completes. If a transaction is unable to obtain a lock on a document it is attempting to modify, likely because another transaction is already holding the lock, it will immediately abort after 5ms with a write conflict.

However, if a typical MongoDB write outside of a multi-document transaction tries to modify a document that is currently being held by a multi-document transaction, that write will block behind the transaction completing. The typical MongoDB write will be infinitely retried with backoff logic until $maxTimeMS is reached. Even prior to 4.0, all wirtes were represented as WiredTiger transactions at the storage layer and it was possible to get write conflicts. This same logic was implemented to avoid users having to react to write conflicts client-side.

Reads do not require the same locks that document modifications do. Documents that have uncommitted writes by a transaction can still be read by other operations, and of course those operations will only see the committed values, and not the uncommitted state.

Only writes trigger write conflicts within MongoDB. Reads, in line with the expected behavior of snapshot isolation, do not prevent a document from being modified by other operations. Changes will not be surfaced as write conflicts unless a write is performed on the document. Additionally, no-op updates – like setting a field to the same value that it already had – can be optimized away before reaching the storage engine, thus not triggering a write conflict. To guarantee that a write conflict is detected, perform an operation like incrementing a counter.

Retrying Transactions

MongoDB 4.0 introduces the concept of error labels. The transient transaction error label notifies listening applications that the error surfaced, ranging from network errors to write conflicts, that the error may be temporary, and that the transaction is safe to retry from the beginning. Permanent errors, like parsing errors, will not have the transient transaction error label, as rerunning the transaction will never result in a successful commit.

One of the core values of MongoDB is its highly available architecture. These error labels make it easy for applications to be resilient in cases of network blips or node failures, enabling cross document transactional guarantees without sacrificing use cases that must be always-on.

Transactions Best Practices

As noted earlier, MongoDB’s single document atomicity guarantees will meet 80-90% of an application’s transactional needs. They remain the recommended way of enforcing your app’s data integrity requirements. For those operations that do require multi-document transactions, there are several best practices that developers should observe.

Creating long running transactions, or attempting to perform an excessive number of operations in a single ACID transaction can result in high pressure on WiredTiger’s cache. This is because the cache must maintain state for all subsequent writes since the oldest snapshot was created. As a transaction uses the same snapshot while it is running, new writes accumulate in the cache throughout the duration of the transaction. These writes cannot be flushed until transactions currently running on old snapshots commit or abort, at which time the transactions release their locks and WiredTiger can evict the snapshot. To maintain predictable levels of database performance, developers should therefore consider the following:

  1. By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time. To address timeouts, the transaction should be broken into smaller parts that allow execution within the configured time limit. You should also ensure your query patterns are properly optimized with the appropriate index coverage to allow fast data access within the transaction.
  2. There are no hard limits to the number of documents that can be read within a transaction. As a best practice, no more than 1,000 documents should be modified within a transaction. For operations that need to modify more than 1,000 documents, developers should break the transaction into separate parts that process documents in batches.
  3. In MongoDB 4.0, a transaction is represented in a single oplog entry, therefore must be within the 16MB document size limit. While an update operation only stores the deltas of the update (i.e., what has changed), an insert will store the entire document. As a result, the combination of oplog descriptions for all statements in the transaction must be less than 16MB. If this limit is exceeded, the transaction will be aborted and fully rolled back. The transaction should therefore be decomposed into a smaller set of operations that can be represented in 16MB or less.
  4. When a transaction aborts, an exception is returned to the driver and the transaction is fully rolled back. Developers should add application logic that can catch and retry a transaction that aborts due to temporary exceptions, such as a transient network failure or a primary replica election. With retryable writes, the MongoDB drivers will automatically retry the commit statement of the transaction.
  5. DDL operations, like creating an index or dropping a database, block behind active running transactions on the namespace. All transactions that try to newly access the namespace while DDL operations are pending, will not be able to obtain locks, aborting the new transactions.
  6. You can review all best practices in the MongoDB documentation for multi-document transactions.

    Transactions and Their Impact to Data Modeling in MongoDB

    Adding transactions does not make MongoDB a relational database – many developers have already experienced that the document model is superior to the relational one today.

    All best practices relating to MongoDB data modeling continue to apply when using features such as multi-document transactions, or fully expressive JOINs (via the $lookup aggregation pipeline stage). Where practical, all data relating to an entity should be stored in a single, rich document structure. Just moving data structured for relational tables into MongoDB will not allow users to take advantage of MongoDB’s natural, fast, and flexible document model, or its distributed systems architecture.

    The RDBMS to MongoDB Migration Guidedescribes the best practices for moving an application from a relational database to MongoDB.

    The Path to Transactions

    Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

    Figure 6 presents a timeline of the key engineering projects that have enabled multi-document transactions in MongoDB, with status shown as of June 2018. The key design goal underlying all of these projects is that their implementation does not sacrifice the key benefits of MongoDB – the power of the document model and the advantages of distributed systems, while imposing no performance impact to workloads that don’t require multi-document transactions.

    Figure 6: The path to transactions – multi-year engineering investment, delivered across multiple releases

    Conclusion

    MongoDB has already established itself as the leading database for modern applications. The document data model is rich, natural, and flexible, with documents accessed by idiomatic drivers, enabling developers to build apps 3-5x faster. Its distributed systems architecture enables you to handle more data, place it where users need it, and maintain always-on availability. MongoDB’s existing atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. The addition of multi-document ACID transactions in MongoDB 4.0 makes it easier than ever for developers to address a complete range of us cases, while for many, simply knowing that they are available will provide critical peace of mind.

    Take a look at our multi-document transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

    If you want to learn more about everything that’s new in MongoDB 4.0, download our Guide.

    Transactional guarantees have been a critical feature for relational databases for decades, but have typically been absent from non-relational alternatives, which has meant that users have been forced to choose between transactions and the flexibility and versatility that non-relational databases offer. With its support for multi-document ACID transactions, MongoDB is built for customers that want to have their cake and eat it too.

    Stephen O’Grady, Principal Analyst, Redmonk

    *Safe Harbour Statement

    This blog contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

    In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

MongoDB 4.0 Release Candidate 0 Has Landed

MongoDB enables you to meet the demands of modern apps with a technology foundation that enables you through:

  1. The document data model – presenting you the best way to work with data.
  2. A distributed systems design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – future-proofing your work and eliminating vendor lock-in.

Building on the foundations above, MongoDB 4.0 is a significant milestone in the evolution of MongoDB, and we’ve just shipped the first Release Candidate (RC), ready for you to test.

Why is it so significant? Let’s take a quick tour of the key new features. And remember, you can learn about all of this and much more at MongoDB World'18 (June 26-27).

Multi-Document ACID Transactions

Previewed back in February, multi-document ACID transactions are part of the 4.0 RC. With snapshot isolation and all-or-nothing execution, transactions extend MongoDB ACID data integrity guarantees to multiple statements and multiple documents across one or many collections. They feel just like the transactions you are familiar with from relational databases, are easy to add to any application that needs them, and and don't change the way non-transactional operations are performed. With multi-document transactions it’s easier than ever for all developers to address a complete range of use cases with MongoDB, while for many of them, simply knowing that they are available will provide critical peace of mind that they can meet any requirement in the future. In MongoDB 4.0 transactions work within a replica set, and MongoDB 4.2 will support transactions across a sharded cluster*.

To give you a flavor of what multi-document transactions look like, here is a Python code snippet of the transactions API.

with client.start_session() as s:
    s.start_transaction():
    try:
        collection.insert_one(doc1, session=s)
        collection.insert_one(doc2, session=s)
    except:
        s.abort_transaction()
        raise
    s.commit_transaction()

And now, the transactions API for Java.

try (ClientSession clientSession = client.startSession()) {
          clientSession.startTransaction();
           try {
                   collection.insertOne(clientSession, docOne);
                   collection.insertOne(clientSession, docTwo);
                   clientSession.commitTransaction();
          } catch (Exception e) {
                   clientSession.abortTransaction();
           }
    }

Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to replica sets, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

Take a look at our multi-document ACID transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started.

Aggregation Pipeline Type Conversions

One of the major advantages of MongoDB over rigid tabular databases is its flexible data model. Data can be written to the database without first having to predefine its structure. This helps you to build apps faster and respond easily to rapidly evolving application changes. It is also essential in supporting initiatives such as single customer view or operational data lakes to support real-time analytics where data is ingested from multiple sources. Of course, with MongoDB’s schema validation, this flexibility is fully tunable, enabling you to enforce strict controls on data structure, type, and content when you need more control.

So while MongoDB makes it easy to ingest data without complex cleansing of individual fields, it means working with this data can be more difficult when a consuming application expects uniform data types for specific fields across all documents. Handling different data types pushes more complexity to the application, and available ETL tools have provided only limited support for transformations. With MongoDB 4.0, you can maintain all of the advantages of a flexible data model, while prepping data within the database itself for downstream processes.

The new $convert operator enables the aggregation pipeline to transform mixed data types into standardized formats natively within the database. Ingested data can be cast into a standardized, cleansed format and exposed to multiple consuming applications – such as the MongoDB BI and Spark connectors for high-performance visualizations, advanced analytics and machine learning algorithms, or directly to a UI. Casting data into cleansed types makes it easier for your apps to to process, sort, and compare data. For example, financial data inserted as a long can be converted into a decimal, enabling lossless and high precision processing. Similarly, dates inserted as strings can be transformed into the native date type.

When $convert is combined with over 100 different operators available as part of the MongoDB aggregation pipeline, you can reshape, transform, and cleanse your documents without having to incur the complexity, fragility, and latency of running data through external ETL processes.

Non-Blocking Secondary Reads

To ensure that reads can never return data that is not in the same causal order as the primary replica, MongoDB blocks readers while oplog entries are applied in batches to the secondary. This can cause secondary reads to have variable latency, which becomes more pronounced when the cluster is serving write-intensive workloads. Why does MongoDB need to block secondary reads? When you apply a sequence of writes to a document, then MongoDB is designed so that each of the nodes must show the writes in the same causal order. So if you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems suffer from this behavior, but MongoDB does not, and never has.

By taking advantage of storage engine timestamps and snapshots implemented for multi-document ACID transactions, secondary reads in MongoDB 4.0 become non-blocking. With non-blocking secondary reads, you now get predictable, low read latencies and increased throughput from the replica set, while maintaining a consistent view of data. Workloads that see the greatest benefits are those where data is batch loaded to the database, and those where distributed clients are accessing low latency local replicas that are geographically remote from the primary replica.

40% Faster Data Migrations

Very few of today’s workloads are static. For example, the launch of a new product or game, or seasonal reporting cycles can drive sudden spikes in load that can bring a database to its knees unless additional capacity can be quickly provisioned. If and when demand subsides, you should be able to scale your cluster back in, rightsizing for capacity and cost.

To respond to these fluctuations in demand, MongoDB enables you to elastically add and remove nodes from a sharded cluster in real time, automatically rebalancing the data across nodes in response. The sharded cluster balancer, responsible for evenly distributing data across the cluster, has been significantly improved in MongoDB 4.0. By concurrently fetching and applying documents, shards can complete chunk migrations up to 40% faster, allowing you to more quickly bring new nodes into service at just the moment they are needed, and scale back down when load returns to normal levels.

Extensions to Change Streams

Change streams, released with MongoDB 3.6, enable developers to build reactive, real-time, web, mobile, and IoT apps that can view, filter, and act on data changes as they occur in the database. Change streams enable seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style.

With MongoDB 4.0, Change Streams can now be configured to track changes across an entire database or whole cluster. Additionally, change streams will now return a cluster time associated with an event, which can be used by the application to provide an associated wall clock time for the event.

Getting Started with MongoDB 4.0

Hopefully this gives you a taste of what’s coming in 4.0. There’s a stack of other stuff we haven’t covered today, but you can learn about it all in the resources below.

To get started with the RC now:

  1. Head over to the MongoDB download center to pick up the latest development build.
  2. Review the 4.0 release notes.
  3. Sign up for the forthcoming MongoDB University training on 4.0.

And you can meet our engineering team and other MongoDB users at MongoDB World'18 (June 26-27).

---

* Safe Harbor Statement

This blog post contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

BookMyShow Continues to Lead Online Entertainment Ticketing in India and Scales to 25 Million Users with MongoDB

India's twin passions for cinema and tech make it a natural fit for automated ticketing. But if ever a market needs scalable solutions, this 1.4 billion-strong nation is it.

That’s a lesson Viraj Patel, VP Technology for BigTree Entertainment, learned the hard way. "We started out in ticketing distribution in 1999 using telephones," he says, "before mobile platforms and internet access were on the scene. It just didn't work. The investors pulled the plug in 2002.”

Undeterred, the company successfully pivoted to selling software to cinema chains. By 2006, Viraj and team were ready to aim for the big prize again. They just needed the right tools. With the internet and mobile data fitting into place, a trial project in online ticket aggregation looked promising enough for investors to fund the launch of BookMyShow in 2007.

“We launched with a 100 percent Microsoft stack,” says Viraj, “but soon realized that scaling with Microsoft was not an easy job.” It wasn’t the Windows platform or the developer tools that were the problem, he recalls: “It was the SQL Server database. That was the first bottleneck as we got more and more traffic, and it soaked up more and more resources and money. It wasn’t the right solution. It couldn’t scale with us.”

Spoiler: By 2018, BookMyShow, each month, sells more than 10 million tickets for all manner of movies and events and serves three billion pages a month across the web and its 50 million plus installed apps. Scaling happened.

The plot changed for the better in 2010 with the discovery of MongoDB. “We were looking around for alternatives, and it was the new kid on the block.” (In fact, MongoDB 1.0 had launched just the year before, and MongoDB India was yet to come.) “We tested it internally as a straight distributed database for monolithic SQL database swap. Every web and mobile application we built needed a database that had performance and scalability, and MongoDB blew us away on both.”

MongoDB really won its spurs when the company added Facebook Connect to its registration process. “The registration database was the first thing we built, and it was running on SQL Server. Which was OK, until Facebook Connect came along and we added that as a registration option. Then the database really struggled. We switched to MongoDB and it was night and day. Tremendous gains. Not only did we get the ability to represent customers directly as JSON documents in the database, which made our data model much simpler, but we got all our performance back.

“We want the flexibility of upgrading the schema for future use cases, and that’s so much easier in MongoDB. The data structures we create are clear and easy to read, and it’s so much simpler to understand and extend,” Viraj adds, about their discovery of the advantages of document-model storage.

MongoDB’s second big job was also thoroughly web scale, as it took on the task of giving each of those millions of users their own bespoke, personalized view of the service. This time, the engineering team knew where to start. “About five years ago, we built our personalization engine on MongoDB,” says Viraj, “and it continues to scale with us. It stores a lot of customer information and when a customer visits, it pulls it out, personalizes it in real time and delivers it. That really improves the customer experience. We see an 18 percent increase in conversion, personalized versus non-personalized.”

Today, MongoDB is the default database for developing ideas and services in BigTree, and Viraj cheerfully admits he has long ago stopped counting how many nodes are in use. “Last time I looked, it was between 100-160,” he says.

Future plans include containerization of the databases to smooth out upgrades and ease of deployment with BigTree’s agile DevOps production pipeline and, when the time comes, sharding the customer database. That’s planned for, but not currently necessary. He explains: “We just haven’t reached the point where writes to MongoDB are the limiting factor anywhere in the service. We get a long way with MongoDB replica sets, and are safe in the knowledge that there are no limitations to scaling further when we need to.”

Viraj cares deeply about latency – “We’re a performance-sensitive company” – and much of the service is instrumented by monitoring and management platforms such as New Relic. While initial performance gains were superlative, he says, things have only continued to improve as new features and technologies have been added. “We had been using SQL tabular databases for customer booking history,” says Viraj. “We moved this to MongoDB and have seen a superb performance boost. What used to take up to 5000 ms on traditional SQL databases went down to 10-20 ms on MongoDB using the MMAP storage engine. When we moved to MongoDB’s default WiredTiger storage engine, it improved five to ten times further, to 2ms. We’re still getting this performance, even though the database now has close to 200 million documents.”

There have been other benefits from following MongoDB’s roadmap. “WiredTiger has made things much more cost-effective,” he says. “Security is better as we now encrypt data instead of storing it in plain JSON. Our customer database is five times more compact and our personalization database uses nearly eight times less storage.”

In the future, he says, they expect aggregation queries and query caching mechanisms will improve performance still more. As for reliability, “MongoDB auto-heals so well in the event of any failures in our platform we don’t even need to worry about it. That’s highly appreciated, and much better than any of the other databases we have used.”

There can be few better stories of early adoption and innovation with MongoDB than the success BigTree Entertainment has enjoyed with BookMyShow. Viraj and his engineers insist on picking the right tools for each part of the job running India’s favourite online ticketing service, their long experience of casting this particular actor in so many roles makes MongoDB a performer they’ve come to rely on.


Read more about what others are building with MongoDB.

What’s New in MongoDB 3.6. Part 4 – Avoid Lock-In, Run Anywhere

Mat Keep

MongoDB 3.6

Welcome to part 4 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • In part 3 we covered what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Run Anywhere

Many organizations are turning to the cloud to accelerate the speed of application development, deployment, and data discovery. Replatforming to the cloud gives them the ability to enable self-service IT, to elastically scale resources on demand, and to align costs to actual consumption. But they are also concerned about exposing the business to deeper levels of lock-in – this time from the APIs and services of the cloud providers themselves.

Increasingly, users are demanding the freedom to run anywhere: private clouds in their own data center, in the public cloud, or in a hybrid model that combines the two. This flexibility is not available when they build on a cloud-proprietary database from a single vendor. Alternatively, the platform independence provided by MongoDB gives them the ability to respond to business or regulatory changes without incurring the complexity, risk, and time that comes from expensive database migrations whenever they need or want to transition to a new platform.

MongoDB Atlas

As a fully managed database service, MongoDB Atlas is the best way to run MongoDB in the public cloud. 2017 has already seen major evolutions in the Atlas service, with key highlights including:

  • Expansion beyond Amazon Web Services (AWS) to offer Atlas on Google Cloud Platform (GCP) and Microsoft Azure.
  • Achieving SOC2 Type 1 compliance.
  • The launch of managed database clusters on a shared architecture, including the free M0 instances, and the M2s and M5s, which allow customers to jumpstart their projects for a low and predictable price.
  • A live migration facility to move data from an existing MongoDB replica set into an Atlas cluster with minimal application impact.
  • The addition of the Data Explorer and Real Time Performance Panel, now coming to Ops Manager, as discussed above.

MongoDB 3.6 is available as a fully managed service on Atlas, along with important new features to support global applications, and with automated scalability and performance optimizations.

Turnkey Global Distribution of Clusters with Cross-Region Replication

MongoDB Atlas clusters can now span multiple regions offered by a cloud provider. This enables developers to build apps that maintain continuous availability in the event of geographic outages, and improve customer experience by locating data closer to users.

When creating a cluster or modifying its configuration, two options are now available:

  • Teams can now deploy a single MongoDB database across multiple regions supported by a cloud provider for improved availability guarantees. Reads and writes will default to a “preferred region” assuming that there are no active failure or failover conditions. The nearest read preference, discussed below, can be used to route queries to local replicas in a globally distributed cluster. Replica set members in additional regions will participate in the automated election and failover process if the primary member is affected by a local outage, and can become a primary in the unlikely event that the preferred region is offline.

  • Read-only replica set members can be deployed in multiple regions, allowing teams to optimize their deployments to achieve reduced read latency for a global audience. Read preference – providing a mechanism to control how MongoDB routes read operations across members of a replica set – can be configured using the drivers. For example, the nearest read preference routes queries to replicas with the lowest network latency from the client, thus providing session locality by minimizing the effects of geographic latency. As the name suggests, read-only replica set members will not participate in the automated election and failover process, and can never be become a primary.

Teams can activate both of the options outlined above in a single database to provide continuous availability and an optimal experience for their users.

Figure 1: Globally distributed MongoDB Atlas cluster, providing resilience to regional outages and lower latency experiences for global apps

Auto-Scaling Storage and Performance Optimization

MongoDB Atlas now supports automatic scaling for the storage associated with a cluster, making it easier for you to manage capacity. Enabled by default, auto-scaling for storage detects when your disks hit 90% utilization and provisions additional storage such that your cluster reaches a disk utilization of 70% on AWS & GCP, or a maximum of 70% utilization on Azure. This automated process occurs without impact to your database or application availability.

In addition to auto-storage scaling, the new Performance Advisor discussed earlier for Ops Manager is also available in MongoDB Atlas, providing you with always-on, data-driven insights into query behavior and index recommendations.

A Cloud Database Platform for Development & Testing

New enhancements to MongoDB Atlas make it the optimal cloud database for spinning up and running test and development environments efficiently.

  • You can now pause your MongoDB Atlas cluster, perfect for use cases where only intermittent access to your data is required, such as development during business hours or temporary testing. While your database instances are stopped, you are charged for provisioned storage and backup storage, but not for instance hours. You can restart your MongoDB Atlas cluster at any time on demand; your cluster configuration will be the same as when you stopped it and public DNS hostnames are retained so no modifications to your connection string are required. MongoDB Atlas clusters can be stopped for up to 7 days. If you do not start your cluster after 7 days, Atlas will automatically start your cluster. Pausing and restarting your MongoDB clusters can be triggered in the MongoDB Atlas UI or via the REST API.
  • Cross-project restores, introduced with Ops Manager 3.6, are also available in MongoDB Atlas, allowing users to restore to different MongoDB Atlas projects than the backup snapshot source.

Next Steps

That wraps up the final part of our what’s new blog series. I hope I’ve helped demonstrate how MongoDB 3.6 helps you move at the speed of your data. It enables new digital initiatives and modernized applications to be delivered to market faster, running reliably and securely at scale, and unlocking insights and intelligence ahead of your competitors.

  • Change streams, retryable writes, causal consistency, greater query and update expressivity, and Compass Community help developers move faster.
  • Ops Manager, schema validation, enhanced security, end to end compression, and user session management help operations teams scale faster.
  • The MongoDB aggregation pipeline, Connector for BI, and the recommended R driver help analysts and data scientists unlock insights faster.

And you have the freedom to run MongoDB anywhere – on-premises, public cloud, and as a service with MongoDB Atlas available on AWS, Azure, and GCP.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

What’s New in MongoDB 3.6. Part 3 – Speed to Insight

Mat Keep

MongoDB 3.6

Welcome to part 3 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation.
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression.
  • In today’s part 3 we’ll cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R.
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Speed to Insight

How quickly an organization can unlock and act on insights from data generated by new applications has become a material source of competitive advantage. Collecting data in operational systems and then relying on batch ETL (Extract, Transform, Load) processes to update an expensive data warehouse or complex and ungoverned data lake is no longer sufficient. Speed to insight is critical, and so analytics performed against live data to drive operational intelligence is fast becoming a necessity, without having to employ armies of highly skilled and scarce data engineers and scientists.

MongoDB 3.6 delivers a number of new features and capabilities that allow organizations to enable real-time analytics and action.

MongoDB Connector for BI: Faster and Simpler

MongoDB 3.6 brings a number of performance and ease-of-use enhancements to the BI Connector, enabling faster time to insight using SQL-based BI and Analytics platforms.

Faster The connector takes advantage of enhancements to the aggregation pipeline – discussed later on this post – to deliver higher performance, with more operations pushed natively to the database. Prior to MongoDB 3.6, only left outer equijoins could be pushed down to the database – all other JOIN types had to be executed within the BI connector layer, which firstly required all matching data to be extracted from the database. With MongoDB 3.6, support is being extended to non-equijoins and the equivalent of SQL subqueries. These enhancements will reduce the amount of data that needs to be moved and computed in the BI layer, providing faster time to insight.

In addition, performance metrics are now observable via the Show Status function, enabling deeper performance insights and optimizations.

Simpler To support easier configuration, the mongosqld process now samples and maps the MongoDB schema, caching the results internally and eliminating the need to install the separate mongodrdl component. Additionally, users can simplify lifecycle management by configuring, deploying, and monitoring the BI connector directly from Ops Manager.

To simplify the enforcement of access controls, BI Connector users can now be authenticated directly against MongoDB using new client-side plugins, eliminating the need to manage TLS certificates. Review the documentation for the C and JDBC authentication plugins to learn more. Authentication via Kerberos is also now supported.

Richer Aggregation Pipeline

Developers and data scientists rely on the MongoDB aggregation pipeline for its power and flexibility in enabling sophisticated data processing and manipulation demanded by real-time analytics and data transformations. Enhancements in the aggregation pipeline unlock new use cases.

A more powerful $lookup operator extends MongoDB’s JOIN capability to support the equivalent of SQL subqueries and non-equijoins. As a result, developers and analysts can write more expressive queries combining data from multiple collections, all executed natively in the database for higher performance, and with less application-side code.

In addition to $lookup, the aggregation pipeline offers additional enhancements:

  • Support for timezone-aware aggregations. Before timezone awareness, reporting that spanned regions and date boundaries was not possible within the aggregation pipeline. Now business analysts can group data for multi-region analysis that takes account of variances in working hours and working days across different geographic regions.
  • New expressions allow richer data transformations within the aggregation pipeline, including the ability to convert objects to arrays of key-value pairs, and arrays of key-value pairs to be converted to objects. The mergeObjects expression is useful for setting missing fields into default values, while the REMOVE variable allows the conditional exclusion of fields from projections based on evaluation criteria. You can learn more about the enhancements from the documentation.

More Expressive Query Language

MongoDB 3.6 exposes the ability to use aggregation expressions within the query language to enable richer queries with less client-side code. This enhancement allows the referencing of other fields in the same document when executing comparison queries, as well as powerful expressions such as multiple JOIN conditions and uncorrelated subqueries. The addition of the new expression operator allows the equivalent of SELECT * FROM T1 WHERE a>b in SQL syntax. Learn more from the $expr documentation.

R Driver for MongoDB

A recommended R driver for MongoDB is now available, enabling developers, data scientists, and statisticians to get the same first-class experience with MongoDB as that offered by the other MongoDB drivers – providing idiomatic, native language access to the database. The driver supports advanced MongoDB functionality, including:

  • Read and write concerns to control data consistency and durability.
  • Enterprise authentication mechanisms, such as LDAP and Kerberos, to enforce security controls against the database.
  • Support for advanced BSON data types such as Decimal 128 to support high precision scientific and financial analysis.

Next Steps

That wraps up the third part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to What’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

What’s New in MongoDB 3.6. Part 2 – Speed to Scale

Mat Keep

MongoDB 3.6

Welcome to part 2 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we’ll dive into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • Part 3 will cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Speed to Scale

Unlike the traditional scale-up systems of the past, distributed systems enable applications to scale further and faster while maintaining continuous availability in the face of outages and maintenance. However, they can impose more complexity on the ops team, potentially slowing down the pace of delivering, scaling, and securing apps in production.

MongoDB 3.6 takes another important step in making it easier for operations teams to deploy and run massively scalable, always-on global applications that benefit from the power of a distributed systems architecture.

Ops Manager

MongoDB Ops Manager is the best way to run MongoDB on your own infrastructure, making operations staff 10x-20x more productive. Advanced management and administration delivered with Ops Manager 3.6 allow operations teams to manage, optimize, and backup distributed MongoDB clusters faster and at higher scale than ever before. Deeper operational visibility allows proactive database management, while streamlined backups reduce the costs and time of data protection.

Simplified Monitoring and Management

It is now easier than ever for administrators to synthesize schema design against real-time database telemetry and receive prescriptive recommendations to optimize database performance and utilization – all from a single pane of glass.

Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

Figure 1: Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

  • The Data Explorer allows operations teams to examine the database’s schema by running queries to review document structure, viewing collection metadata, and inspecting index usage statistics, directly within the Ops Manager UI.
  • The Real Time Performance Panel provides insight from live server telemetry, enabling issues to be immediately identified and diagnosed. The panel displays all operations in flight, network I/O, memory consumption, the hottest collections, and slowest queries. Administrators also have the power to kill long running operations from the UI.
  • The new Performance Advisor, available for both Ops Manager and MongoDB Atlas, continuously highlights slow-running queries and provides intelligent index recommendations to improve performance. Using Ops Manager automation, the administrator can then roll out the recommended indexes automatically, without incurring any application downtime.

Ops Manager Organizations To simplify management of global MongoDB estates, Ops Manager now provides a new Organizations and Projects hierarchy. Previously Projects, formerly called “groups”, were managed as individual entities. Now multiple Projects can be placed under a single organization, allowing operations teams to centrally view and administer all Projects under the organization hierarchy. Projects can be assigned tags, such as a “production” tag, against which global alerting policies can be configured.

Faster, Cheaper and Queryable Backups

Ops Manager continuously maintains backups of your data, so if an application issue, infrastructure failure, or user error compromises your data, the most recent backup is only moments behind, minimizing exposure to data loss. Ops Manager offers point-in-time backups of replica sets, and cluster-wide snapshots of sharded clusters, guaranteeing consistency and no data loss. You can restore to precisely the moment you need, quickly and safely. Ops Manager backups are enhanced with a range of new features:

  • Queryable Backups, first introduced in MongoDB Atlas, allow partial restores of selected data, and the ability to query a backup file in-place, without having to restore it. Now users can query the historical state of the database to track data and schema modifications – a common demand of regulatory reporting. Directly querying backups also enables administrators to identify the best point in time to restore a system by comparing data from multiple snapshots, thereby improving both RTO and RPO. No other non-relational database offers the ability to query backups in place.
  • The Ops Manager 3.6 backup agent has been updated to use a faster and more robust initial sync process. Now, transient network errors will not cause the initial sync to restart from the beginning of the backup process, but rather resume from the point the error occurred. In addition, refactoring of the agent will speed data transfer from MongoDB to the backup repository, with the performance gain dependent on document size and complexity.
  • Reducing backup storage overhead by 1x of your logical production data and further improving speed to recovery, Point-in-Time snapshots will now be created at the destination node for the restore operation, rather than at the backup server, therefore reducing network hops. The restore process now transfers backup snapshots directly to the destination node, and then applies the oplog locally, rather than applying it at the daemon server first and then pushing the complete restore image across the network. Note that this enhancement does not apply to restores via SCP.
  • Extending support for the AWS S3 object store, backups can now be routed to on-premises object stores such as EMC ECS or IBM Cleversafe. MongoDB’s backup integration provides administrators with greater choice in selecting the backup storage architecture that best meets specific organizational requirements for data protection. It enables them to take advantage of cheap, durable, and quickly growing object storage used within the enterprise. By limiting backups to filesystems or S3 only, most other databases fail to match the storage flexibility offered by MongoDB.
  • With cross-project restores, users can now perform restores into a different Ops Manager Project than the backup snapshot source. This allows DevOps teams to easily execute tasks such as creating multiple staging or test environments that match recent production data, while configured with different user access privileges or running in different regions.

Review the Ops Manager documentation to learn more.

Schema Validation

MongoDB 3.6 introduces Schema Validation via syntax derived from the proposed IETF JSON Schema standard. This new schema governance feature extends the capabilities of document validation, originally introduced in MongoDB 3.2.

While MongoDB’s flexible schema is a powerful feature for many users, there are situations where strict guarantees on data structure and content are required. MongoDB’s existing document validation controls can be used to require that any documents inserted or updated follow a set of validation rules, expressed using MongoDB query syntax. While this allows for the definition of required content for each document, it had no mechanism to restrict users from adding documents containing fields beyond those specified in the validation rules. In addition, there is no way for administrators to specify and enforce control over the complete structure of documents, including data nested inside arrays.

Using schema validation, DevOps and DBA teams can now define a prescribed document structure for each collection, which can reject any documents that do not conform to it. With schema validation, MongoDB enforces controls over JSON data that are unmatched by any other database:

  • Complete schema governance. Administrators can define when additional fields are allowed to be added to a document, and specify a schema on array elements including nested arrays.
  • Tunable controls. Administrators have the flexibility to tune schema validation according to use case – for example, if a document fails to comply with the defined structure, it can be either be rejected, or still written to the collection while logging a warning message. Structure can be imposed on just a subset of fields – for example requiring a valid customer a name and address, while others fields can be freeform, such as social media handle and cellphone number. And of course, validation can be turned off entirely, allowing complete schema flexibility, which is especially useful during the development phase of the application.
  • Queryable. The schema definition can be used by any query to inspect document structure and content. For example, DBAs can identify all documents that do not conform to a prescribed schema.

With schema validation, developers and operations teams have complete control over balancing the agility and flexibility that comes from a dynamic schema, with strict data governance controls enforced across entire collections. As a result, they spend less time defining data quality controls in their applications, and instead delegate these tasks to the database. Specific benefits of schema validation include:

  1. Simplified application logic. Guarantees on the presence, content, and data types of fields eliminates the need to implement extensive error handling in the application. In addition, the need to enforce a schema through application code, or via a middleware layer such as an Object Document Mapper, is removed.
  2. Enforces control. Database clients can no longer compromise the integrity of a collection by inserting or updating data with incorrect field names or data types, or adding new attributes that have not been previously approved.
  3. Supports compliance. In some regulated industries and applications, it is required that Data Protection Officers demonstrate that data is stored in a specific format, and that no additional attributes have been added. For example, the EU’s General Data Protection Regulation (GDPR) requires an impact assessment against all Personally Identifiable Information (PII), prior to any processing taking place.

Extending Security Controls

MongoDB offers among the most extensive and mature security capabilities of any modern database, providing robust access controls, end-to-end data encryption, and complete database auditing. MongoDB 3.6 continues to build out security protection with two new enhancements that specifically reduce the risk of unsecured MongoDB instances being unintentionally deployed into production.

From the MongoDB 2.6 release onwards, the binaries from the official MongoDB RPM and DEB packages bind to localhost by default. With MongoDB 3.6, this default behavior is extended to all MongoDB packages across all platforms. As a result, all networked connections to the database will be denied unless explicitly configured by an administrator. Review the documentation to learn more about the changes introduced by localhost binding. Combined with new IP whitelisting, administrators can configure MongoDB to only accept external connections from approved IP addresses or CIDR ranges that have been explicitly added to the whitelist.

End-to-End Compression

Adding to intra-cluster network compression released in MongoDB 3.4, the new 3.6 release adds wire protocol compression to network traffic between the client and the database.

Creating highly efficient distributed systems with end to end compression Figure 2: Creating highly efficient distributed systems with end to end compression

Wire protocol compression can be configured with the snappy or zLib algorithms, allowing up to 80% savings in network bandwidth. This reduction brings major performance gains to busy network environments and reduces connectivity costs, especially in public cloud environments, or when connecting remote assets such as IoT devices and gateways.

With compression configurable across the stack – for client traffic, intra-cluster communications, indexes, and disk storage – MongoDB offers greater network, memory, and storage efficiency than almost any other database.

Enhanced Operational Management in Multi-Tenant Environments

Many MongoDB customers have built out their database clusters to serve multiple applications and tenants. MongoDB 3.6 introduces two new features that simplify management and enhance scalability:

Operational session management enables operations teams to more easily inspect, monitor, and control each user session running in the database. They can view, group, and search user sessions across every node in the cluster, and respond to performance issues in real time. For example, if a user or developer error is causing runaway queries, administrators now have the fine-grained operational oversight to view and terminate that session by removing all associated session state across a sharded cluster in a single operation. This is especially useful for multi-tenant MongoDB clusters running diverse workloads, providing a much simpler interface for identifying active operations in the database cluster, recovering from cluster overloads, and monitoring active users on a system. Review the sessions commands documentation to learn more.

Improved scalability with the WiredTiger storage engine to better support common MongoDB use cases that create hundreds of thousands of collections per database, for example:

  • Multi-tenant SaaS-based services that create a collection for each user.
  • IoT applications that write all sensor data ingested over an hour or a day into a unique collection.

As the collection count increased, MongoDB performance could, in extreme cases, degrade as the WiredTiger session cache managing a cursor’s access to collections and indexes became oversubscribed. MongoDB 3.6 introduces a refactoring of the session cache from a list to hash table, with improved cache eviction policies and checkpointing algorithms, along with higher concurrency by replacing mutexes with Read/Write locks. As a result of this refactoring, a single MongoDB instance running with the WiredTiger storage engine can support over 1 million collections. Michael Cahill, director of Storage Engineering, presented a session on the development work at the MongoDB World ‘17 customer conference. Review the session slides to learn more.

Next Steps

That wraps up the second part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

What’s New in MongoDB 3.6. Part 1 – Speed to Develop

Mat Keep

Company, MongoDB 3.6

MongoDB 3.6 is now Generally Available (GA), and ready for production deployment. In this short blog series, I’ll be taking you on a whirlwind tour of what’s new in this latest release:

  • Today, we’ll take a look at the new capabilities designed specifically to help developers build apps faster. We’ll take a look at change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we’ll dive into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • Part 3 will cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Developer-First

MongoDB has always been a developer-first technology. Its document data model maps naturally to objects in application code, making it simple for developers to learn and use. A document’s schema can be dynamically created and modified without downtime, making it fast to build and evolve applications. Native, idiomatic drivers are provided for 10+ languages – and the community has built dozens more – enabling ad-hoc queries, real-time aggregation and rich indexing to provide powerful programmatic ways to access and analyze data of any structure.

MongoDB 3.6 builds upon these core capabilities to allow developers to create rich apps and customer experiences, all with less code.

Change Streams

Change streams enable developers to build reactive, real-time, web, mobile, and IoT apps that can view, filter, and act on data changes as they occur in the database. Change streams enable seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style.

Implemented as an API on top of MongoDB’s operation log (oplog), consumers can open change streams against collections and filter on relevant events using the $match, $project, and $redact aggregation pipeline stages. The application can register for notifications whenever a document or collection is modified, enabling downstream applications and consumers to act on new data in real time, without constantly querying the entire collection to identify changes. Applications can consume change streams directly, via a message queue, or through a backend service such as MongoDB Stitch (coming soon).

Use cases enabled by MongoDB change streams include:

  • Powering trading applications that need to be updated in real time as stock prices rise and fall.
  • Synchronizing updates across serverless and microservices architectures by triggering an API call when a document is inserted or modified. For example, new customer orders written to the database may automatically trigger functions to generate invoices and delivery schedules.
  • Updating dashboards, analytics systems, and search engines as operational data changes.
  • Creating powerful IoT data pipelines that can react whenever the state of physical objects change. For example, generating alarms whenever a connected vehicle moves outside of a geo-fenced area.
  • Pushing new credit card transactions into machine learning training models to re-score fraud classifications.
  • Refreshing scoreboards in multiplayer games.

MongoDB change streams enable consumers to react to data changes in real time Figure 1: MongoDB change streams enable consumers to react to data changes in real time

Some MongoDB users requiring real-time notifications have built their own change data capture processes that “tail” the oplog. By migrating to change streams, these users can reduce development and operational overhead, improve usability, and increase data reliability. When compared to both oplog tailing and change notifications implemented by alternative databases, MongoDB change streams offer a number of advantages:

  • Change streams are flexible – users can register to receive just the individual deltas from changes to a document, or receive a copy of the full document.
  • Change streams are consistent – by utilizing a global logical clock, change streams ensure a total ordering of event notifications across shards. As a result, MongoDB guarantees the order of changes will be preserved, and can be safely processed by the consuming application in the order received from the stream.
  • Change streams are secure – users are able to create change streams only on collections to which they have been granted read access.
  • Change streams are reliable – notifications are only sent on majority committed write operations, and are durable when nodes or the network fails.
  • Change streams are resumable – when nodes recover after a failure, change streams can be automatically resumed, assuming that the last event received by the application has not rolled off the oplog.
  • Change streams are familiar – the API syntax takes advantage of the established MongoDB drivers and query language, and are independent of the underlying oplog format.
  • Change streams are highly concurrent – up to 1,000 change streams can be opened against each MongoDB instance with minimal performance degradation.

Review the MongoDB change streams documentation to learn more.

Retryable Writes

The addition of retryable writes to MongoDB moves the complexity of handling temporary system failures from the application to the database. Now, rather than the developer having to implement custom, client-side code, the MongoDB driver can automatically retry writes in the event of transient network failures or a primary replica election, while the MongoDB server enforces exactly-once processing semantics.

By assigning a unique transaction identifier to each write operation, the driver re-sends that ID to enable the server to evaluate success of the previous write attempt, or retry the write operation as needed. This implementation of retryable writes offers a number of benefits over approaches taken by other databases:

  • Retryable writes are not limited to idempotent operations only. They can also be applied to operations such as incrementing or decrementing a counter, or processing orders against stock inventory.
  • Retryable writes are safe for operations that failed to acknowledge success back to the application due to timeout exceptions, for example due to a transient network failure.
  • Retryable writes do not require developers to add any extra code to their applications, such as retry logic or savepoints.

Applications that cannot afford any loss of write availability, such as e-commerce applications, trading exchanges, and IoT sensor data ingestion, immediately benefit from retryable writes. When coupled with self-healing node recovery – typically within 2-seconds or less – MongoDB’s retryable writes enable developers to deliver always-on, global availability of write operations, without the risks of data loss and stale reads imposed by eventually consistent, multi-master systems.

Tunable Consistency

With tunable consistency, MongoDB affords developers precise control over routing queries across a distributed cluster, balancing data consistency guarantees with performance requirements. MongoDB 3.4 added linearizable reads, which were central to MongoDB passing Jepsen – some of the most stringent data safety and correctness tests in the database industry.

Now the MongoDB 3.6 release introduces support for causal consistency – guaranteeing that every read operation within a client session will always see the previous write operation, regardless of which replica is serving the request. By enforcing strict, causal ordering of operations within a session, causal consistency ensures every read is always logically consistent, enabling monotonic reads from a distributed system – guarantees that cannot be met by most multi-node databases.

Causal consistency allows developers to maintain the benefits of strict data consistency enforced by legacy single node relational databases, while modernizing their infrastructure to take advantage of the scalability and availability benefits of modern distributed data platforms.

Developer Tooling: MongoDB Compass

As the GUI for MongoDB, Compass has become an indispensable tool for developers and DBAs, enabling graphical schema discovery and query optimization. Compass now offers several new features:

  • Auto-complete: Enables developers to simplify query development with Compass providing suggestions for field names and MongoDB operators, in addition to matching braces and quotes as they code.
  • Query History: Allows developers to re-run their most recently executed queries, and save common queries to run on-demand.
  • Table View: Now developers can view documents as conventional tables, as well as JSON documents.

MongoDB Compass is not just a single tool – it’s a framework built to allow for the addition of modular components. Compass now exposes this as the Compass Plugin Framework, making Compass extensible by any user with the same methods used by MongoDB’s software engineers. Using the plugin API, users can build plugins to add new features to Compass. Examples include a GridFS viewer, a sample data generator, a hardware stats viewer, a log collector/analyzer, and more.

You can learn more about these new features in the MongoDB Compass documentation.

MongoDB Compass Community

With the MongoDB 3.6 release, the Compass family has expanded to now include the new, no-cost Compass Community edition.

Compass Community provides developers an intuitive visual interface to use alongside the MongoDB shell. It includes the core features of Compass, enabling users to review the hierarchy and size of databases and collections, inspect documents, and insert / update / delete documents. Developers can use the GUI to build queries, examine how they’re executed, and add or drop indexes to improve performance. Compass Community also supports the latest Compass functionality available with MongoDB 3.6, making developers even more productive.

MongoDB Compass Community Figure 2: MongoDB Compass Community, new no-cost GUI for MongoDB developers

MongoDB Compass Community is available from the MongoDB download center.

Fully Expressive Array Updates

Arrays are a powerful construct in MongoDB’s document data model, allowing developers to represent complex objects in a single document that can be efficiently retrieved in one call to the database. Before MongoDB 3.6, however, it was only possible to atomically update the first matching array element in a single update command.

With fully expressive array updates, developers can now perform complex array manipulations against matching elements of an array – including elements embedded in nested arrays – all in a single atomic update operation. MongoDB 3.6 adds a new arrayFilters option, allowing the update to specify which elements to modify in the array field. This enhancement allows even more flexibility in data modeling. It also delivers higher performance than alternative databases supporting JSON data as entire documents do not need to be rewritten when only selective array elements are updated.

Learn more from the array update documentation.

Next Steps

That wraps up the first part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

eBay: Building Mission-Critical Multi-Data Center Applications with MongoDB

Mat Keep

Customer Stories

As a top 10 global retail brand with 170+ million active buyers and 1 billion live listings across 190 markets around the world, eBay cannot afford systems downtime. This is why the company relies on MongoDB as one of its core enterprise data platform standards, powering multiple, customer-facing applications that run ebay.com.

At this year’s MongoDB World conference, Feng Qu, eBay’s lead NoSQL DBA, presented Practical Design Patterns for Resilient Applications – a set of architectural blueprints his team has developed to support enterprise-class MongoDB deployments.

Mr. Qu began his session discussing how the concept of availability has changed over the years. In the past, it was acceptable for sites to take scheduled downtime for weekly maintenance events. With the global nature of today’s services, neither users, or the business, are quite so accepting! In addition, most organizations now build out their services on commodity hardware platforms, rather than the exotic Sun Solaris / Sparc servers of yesteryear. While commodity hardware is much less costly, it also fails much more regularly. Both of these factors radically alter how engineering teams consider availability, and has led eBay to create its “Resiliency Design Patterns” to institute database best practices that maximize Mean Time To Failure (MTTF) and minimize Mean Time To Recovery (MTTR).

To build their apps, eBay developers can choose from five corporate-approved database standards. Alongside MongoDB, teams also have the option to use Oracle or MySQL relational databases, and two NoSQL options. Mr. Qu’s DBA team provide guidance on the appropriate database choice, qualifying the selection against the application’s data access patterns, user load, data types, and more.

eBay currently runs over 3,000 non-relational database instances powering a range of applications, managing multiple petabytes of data between them. In the past Oracle was the System of Record, while the non-relational databases handled transient data used in “systems of engagement”. However, the non-relational database landscape has matured. With consistent, point-in-time backup and recovery, MongoDB now also serves System of Record use cases at eBay.

While all of eBay’s non-relational database choices offer built in resilience to failure, they make different design tradeoffs that can impact application behavior. The DBA team assesses these differences across six dimensions: availability, consistency, durability, recoverability, scalability, and performance. For example, those NoSQL databases using peer-to-peer, masterless designs have expensive data repair and rebalancing processes that must be initiated following a node failure. This rebalancing process impacts both application throughput and latency, and can cause connection stacking as clients wait for recovery, which can lead to application downtime. To mitigate these affects, eBay has had to layer an application-level sharding solution, originally developed for its Oracle estate, on top of those masterless databases. This approach enables the DBA team to divide larger clusters into a series of sub-clusters, which isolates rebalancing overhead to a smaller set of nodes, impacting just a subset of queries. It is against these different types of database behaviors that the eBay DBA team builds its Resiliency Design Patterns.

Mr. Qu presented eBay’s standard “MongoDB Resilience Design Pattern”, as shown in Figure 1 below.

eBay design pattern of it's MongoDB Resilience Architecture Figure 1: eBay design pattern for its MongoDB Resilience Architecture. (Image courtesy of eBay’s MongoDB World presentation).

In this design pattern, a 7-node MongoDB replica set is distributed across eBay’s three US data centers. This pattern ensures that in the event of the primary data center failing, the database cluster can maintain availability by establishing a quorum between remaining data centers. MongoDB’s replica set members can be assigned election priorities that control which secondary members are considered as candidates for promotion in the event of a primary failure. For example, the nodes local to DC1 are prioritized for election if the primary replica set member fails. Only if the entire DC1 suffers an outage are the replica set members in DC2 considered for election, with the new primary member selected on the basis of which node has committed the most recent write operations. This design pattern can be extended by using MongoDB’s majority write concern to enable writes that are durable across data centers.

The standard MongoDB design pattern is used as the basis for eBay’s “Read Intensive / Highly Available Read Pattern” discussed in the presentation, which is used to power the eBay product catalog. For the catalog workload, the MongoDB replica set is scaled out to 50 members, providing massive data distribution for both read scalability and resilience.

For more write-intensive workloads, eBay has developed its “Extreme High Read / Write Pattern”, which distributes a sharded MongoDB cluster across its US data centers.

eBay design pattern for the MongoDB Extreme High Read / Write Patter Figure 2: eBay design pattern for the MongoDB Extreme High Read / Write Pattern. (Image courtesy of eBay’s MongoDB World presentation).

Again, eBay developers can configure this design pattern with specific MongoDB write and read concerns to tune the levels of durability and consistency that best meet the needs of different applications.

Mr. Qu noted that with recent product enhancements, MongoDB is being deployed to serve a greater range of application needs:

  • The addition of zone sharding to MongoDB 3.4 now enables eBay to serve applications that demand distributed, always-on write availability across multiple data centers.
  • Retryable writes, targeted for the forthcoming MongoDB 3.6 release, will allow eBay to reduce application-side exception handling code.

Review the recording of Feng Qu’s presentation at MongoDB World to learn more about eBay’s Design Patterns.

Download the MongoDB Multi-Data Center Deployments guide to get deeper insight into enabling active/active data center deployments and global data distribution with MongoDB.

GDPR: Impact to Your Data Management Landscape: Part 4

Mat Keep

Business

Welcome to the final installment of our 4-part blog series.

  • In part 1, we provided a primer into the GDPR – covering its rationale, and key measures
  • In part 2, we explored what the GDPR means for your data platform
  • In part 3, we discussed how MongoDB’s products and services can support you in your path to compliance
  • Finally, in this part 4, we’re going to examine how the GDPR can help in customer experience, and provide a couple of case studies.

If you can’t wait for all 4 parts of the series, but would rather get started now, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

Using the GDPR for Customer Experience Transformation

As discussed in parts 1 and 2 of this blog series, to comply with the GDPR, organizations will need to identify all personal data within their systems. Forward-looking companies can leverage the regulations for personal data discovery processes to transform interactions with their customers.

Marketing and sales groups have long seen the value in aggregating data from multiple, disconnected systems into a single, holistic, real-time representation of their customer. This single view can help in enhancing customer insight and intelligence – with the ability to better understand and predict customer preferences, behaviors, and needs.

However, for many organizations, delivering a single view of the customer to the business has been elusive. Technology has been one limitation – for example, the rigid, tabular data model imposed by traditional relational databases inhibits the schema flexibility necessary to accommodate the diverse customer data sets contained in multiple source systems. But limitations extend beyond just the technology to include the business processes needed to deliver and maintain a single view.

MongoDB has been used in many single view projects across enterprises of all sizes and industries. Through the best practices observed and institutionalized over the years, MongoDB has developed a repeatable, 10-step methodology to successfully delivering a single view.

Figure 1:10-step methodology to building a single customer view with MongoDB

You can learn more by downloading the MongoDB single view whitepaper, covering:

  • The 10-step methodology to delivering a single view
  • The required technology capabilities and tools to accelerate project delivery
  • Case studies from customers who have built transformational single view applications on MongoDB

Case Studies

MongoDB has been downloaded over 30 million times and counts 50% of the Fortune 100 as commercial customers of MongoDB’s products and services. Among the Fortune 500 and Global 500, MongoDB customers include:

  • 40 of the top financial services institutions
  • 15 of the top retailers
  • 15 of the top telcos
  • 15 of the top healthcare companies
  • 10 of the top media and entertainment companies

MongoDB is used by enterprises of all sizes, and from all industries to build modern applications, often as part of digital transformation initiatives in the cloud. An example of such a company is Estates Gazette, the UK’s leading commercial property data service.

Estates Gazette (EG)

The company’s business was built on print media, with the Estates Gazette journal serving as the authoritative source on commercial property across the UK for well over a century. Back in the 1990s, the company was quick to identify the disruptive potential of the Internet, embracing it as a new channel for information distribution. Pairing its rich catalog of property data and market intelligence with new data sources from mobile and location services – and the ability to run sophisticated analytics across all of it in the cloud – the company is now accelerating its move into enriched market insights, complemented with decision support systems.

To power its digital transformation, Estates Gazette migrated from legacy relational databases to MongoDB, running an event-driven architecture and microservices, deployed to the Amazon Web Services (AWS) cloud. The company is also using MongoDB Enterprise Advanced with the Encrypted storage engine to extend its security profile, and prepare for the EU GDPR.

You can learn more by reading the Estates Gazette case study.

Leading European Retailer

As part of its ongoing digital transformation that extends customer engagement beyond brick and mortar stores to mobile channels, the retailer with over 50,000 employees and €4.5bn in annual sales, was building a new mobile app offering opt-in marketing services to collect customer data, storing it in MongoDB.

As part of its GDPR readiness, the retailer employed MongoDB Global Consulting Services to advise on data protection best practices, taking advantage of the MongoDB Enterprise Advanced access controls, encryption, and auditing framework. By using MongoDB consultants in the design phase of the project, the retailer has been able to adopt a “security by design and by default” approach, while enhancing its security posture.

Wrapping Up our 4-Part Blog Series

That wraps up the final part of our 4-part blog series. If you want to read the entire series in one place, download the complete GDPR: Impact to Your Data Management Landscape whitepaper today.

It’s worth remembering that it takes much more than security controls of a database to achieve GDPR compliance. However, MongoDB is offering a holistic vision of how database customers can accelerate a path to meeting regulations scheduled for enforcement from May 2018.

Using the advanced security features available in MongoDB Enterprise Advanced and the MongoDB Atlas managed database service, organizations have extensive capabilities to implement the data discovery, defense, and detection requirements demanded by the GDPR. Methodologies used in successfully delivering customer single view projects can be used to support data discovery, and used to innovate in delivering a differentiated customer experience.

Disclaimer
For a full description of the GDPR’s regulations, roles, and responsibilities, it is recommended that readers refer to the text of the GDPR (Regulation (EU) 2016/679), available from the Official Journal of the European Union, and refer to legal counsel for the interpretation of how the regulations apply to their organization. Further, in order to effectively achieve the functionality described in this blog series, it is critical to ensure that the database is implemented according to the specifications and instructions detailed in the MongoDB security documentation. Readers should consider engaging MongoDB Global Consulting Services to assist with implementation.