GIANT Stories at MongoDB

MongoDB: The Most Wanted Database by Developers for the 4th Consecutive Year

Since 2011, Stack Overflow has taken the pulse of the developer community, revealing the top trends, technologies, and work priorities for software engineers around the world. The survey always provides fascinating insight, and 2019 was no exception with nearly 90,000 developers completing it. And now the results are in, and its another good year for MongoDB.

Official MongoDB Go Driver Now GA and Ready For Production

Mat Keep

Releases, Developer, Go

Today, we are making the official MongoDB Go driver Generally Available (GA). We welcome everyone to the first day of a new generation of production-ready Go and MongoDB applications.

The MongoDB Go driver entered beta testing in December 2018. Since then, it's been downloaded thousands of times, generating valuable feedback from the community that has enabled our engineers to move development forward to today's GA release. Take a look at the beta announcement if you want to learn more about why we invested in building an official Go driver, and how we use Go at MongoDB.

Announcing the MongoDB Distributed Transactions Beta Program

It was just over 12 months ago that we announced we were bringing multi-document ACID transactions to MongoDB. We shipped the first beta code a couple of weeks later, and then after several thousand of you put transactions through their paces, went to General Availability (GA) as a part of the MongoDB 4.0 release in June 2018.

We're now really excited to announce the next phase of this development with the introduction of Distributed Transactions, extending our multi-document ACID guarantees from replica sets to sharded clusters. As a result, it will now be even easier for you to address a complete range of use cases by enforcing transactional guarantees across high scale, globally distributed apps.

Official MongoDB Go Driver Now Available for Beta Testing

Mat Keep

Releases, Developer, Go

We’re pleased to announce that the official MongoDB Go driver is moving into beta, ready for the wider Go and MongoDB community to put it to the test – we think you’ll really like it.

In this blog, we will discuss:

  1. The growing importance of Go
  2. How we use it today at MongoDB
  3. Our rationale for building a new driver
  4. Resources to get you started with it.

MongoDB 4.0: Non-Blocking Secondary Reads

Mat Keep

Technical, MongoDB 4.0

Many MongoDB users scale read performance by distributing their query load across secondary replicas. With the MongoDB 4.0 release, reads are no longer blocked when oplog entries are applied. Here's how

MongoDB Multi-Document ACID Transactions are GA

With the release of 4.0, you now have multi-document ACID transactions in MongoDB.

But how have you been able to be so productive with MongoDB up until now? The database wasn’t built overnight and many applications have been running mission critical use cases demanding the highest levels of data integrity without multi-document transactions for years. How was that possible?

Well, let’s think first about why many people believe they need multi-document transactions. The first principle of relational data modeling is normalizing your data across tables. This means that many common database operations, like account creation, require atomic updates across many rows and columns.

In MongoDB, the data model is fundamentally different. The document model encourages users to store related data together in a single document. MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don’t need ACID guarantees across multiple documents.

However, transactions are not just a check box. Transactions, like every MongoDB feature, aim to make developers lives easier. ACID guarantees across documents simplify application logic needed to satisfy complex applications.

The value of MongoDB’s transactions

While MongoDB’s existing atomic single-document operations already provide transaction semantics that satisfy the majority of applications, the addition of multi-document ACID transactions makes it easier than ever for developers to address the full spectrum of use cases with MongoDB. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. For developers with a history of transactions in relational databases, MongoDB’s multi-document transactions are very familiar, making it straightforward to add them to any application that requires them.

In MongoDB 4.0, transactions work across a replica set, and MongoDB 4.2 will extend support to transactions across a sharded deployment* (see the Safe Harbor statement at the end of this blog). Our path to transactions represents a multi-year engineering effort, beginning back in early 2015 with the groundwork laid in almost every part of the server and the drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

In this blog, we will explore why MongoDB had added multi-document ACID transactions, their design goals and implementation, characteristics of transactions and best practices for developers. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

Why Multi-Document ACID Transactions

Since its first release in 2009, MongoDB has continuously innovated around a new approach to database design, freeing developers from the constraints of legacy relational databases. A design founded on rich, natural, and flexible documents accessed by idiomatic programming language APIs, enabling developers to build apps 3-5x faster. And a distributed systems architecture to handle more data, place it where users need it, and maintain always-on availability. This approach has enabled developers to create powerful and sophisticated applications in all industries, across a tremendously wide range of use cases.

Figure 1: Organizations innovating with MongoDB

With subdocuments and arrays, documents allow related data to be modeled in a single, rich and natural data structure, rather than spread across separate, related tables composed of flat rows and columns. As a result, MongoDB’s existing single document atomicity guarantees can meet the data integrity needs of most applications. In fact, when leveraging the richness and power of the document model, we estimate 80%-90% of applications don’t need multi-document transactions at all.

However, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-document transactions are a requirement for any database, irrespective of the data model they are built upon. Some are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future. And for some workloads, support for ACID transactions across multiple documents is required.

As a result, the addition of multi-document transactions makes it easier than ever for developers to address a complete range of use cases on MongoDB. For some, simply knowing that they are available will assure them that they can evolve their application as needed, and the database will support them.

Data Models and Transactions

Before looking at multi-document transactions in MongoDB, we want to explore why the data model used by a database impacts the scope of a transaction.

Relational Data Model

Relational databases model an entity’s data across multiple records and parent-child tables, and so a transaction needs to be scoped to span those records and tables. The example in Figure 2 shows a contact in our customer database, modeled in a relational schema. Data is normalized across multiple tables: customer, address, city, country, phone number, topics and associated interests.

Figure 2: Customer data modeled across separate tables in a relational database

In the event of the customer data changing in any way, for example if our contact moves to a new job, then multiple tables will need to be updated in an “all-or-nothing” transaction that has to touch multiple tables, as illustrated in Figure 3.

Figure 3: Updating customer data in a relational database

Document Data Model

Document databases are different. Rather than break related data apart and spread it across multiple parent-child tables, documents can store related data together in a rich, typed, hierarchical structure, including subdocuments and arrays, as illustrated in Figure 4.

Figure 4: Customer data modeled in a single, rich document structure

MongoDB provides existing transactional properties scoped to the level of a document. As shown in Figure 5, one or more fields may be written in a single operation, with updates to multiple subdocuments and elements of any array, including nested arrays. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. With this design, application owners get the same data integrity guarantees as those provided by older relational databases.

Figure 5: Updating customer data in a document database

Where are Multi-Document Transactions Useful

There are use cases where transactional ACID guarantees need to be applied to a set of operations that span multiple documents. Back office “System of Record” or “Line of Business” (LoB) applications are the typical class of workload where multi-document transactions are useful. Examples include:

  • Processing application events when users perform important actions -- for instance when updating the status of an account, say to delinquent, across all those users’ documents.
  • Logging custom application actions -- say when a user transfers ownership of an entity, the write should not be successful if the logging isn’t.
  • Many to many relationships where the data naturally fits into defined objects -- for example positions, calculated by an aggregate of hundreds of thousands of trades, need to be updated every time trades are added or modified.

MongoDB already serves these use cases today, but the introduction of multi-document transactions makes it easier as the database automatically handles multi-document transactions for you. Before their availability, the developer would need to programmatically implement transaction controls in their application. To ensure data integrity, they would need to ensure that all stages of the operation succeed before committing updates to the database, and if not, roll back any changes. This adds complexity that slows down the rate of application development. MongoDB customers in the financial services industry have reported they were able to cut 1,000+ lines of code from their apps by using multi-document transactions.

In addition, implementing client side transactions can impose performance overhead on the application. For example, after moving from its existing client-side transactional logic to multi-document transactions, a global enterprise data management and integration ISV experienced improved MongoDB performance in its Master Data Management solution: throughput increased by 90%, and latency was reduced by over 60% for transactions that performed six updates across two collections.

Multi-Document ACID Transactions in MongoDB

Transactions in MongoDB feel just like transactions developers are used to in relational databases. They are multi-statement, with similar syntax, making them familiar to anyone with prior transaction experience.

The following Python code snippet shows a sample of the transactions API.

    with client.start_session() as s:
            s.start_transaction()
            collection_one.insert_one(doc_one, session=s)
            collection_two.insert_one(doc_two, session=s)
            s.commit_transaction()

The following snippet shows the transactions API for Java.

    try (ClientSession clientSession = client.startSession()) {
                  clientSession.startTransaction();
                  collection.insertOne(clientSession, docOne);
                  collection.insertOne(clientSession, docTwo);
                  clientSession.commitTransaction();
        }

As these examples show, transactions use regular MongoDB query language syntax, and are implemented consistently whether the transaction is executed across documents and collections in a replica set, and with MongoDB 4.2, across a sharded cluster*.

We're excited to see MongoDB offer dedicated support for ACID transactions in their data platform and that our collaboration is manifest in the Lovelace release of Spring Data MongoDB. It ships with the well known Spring annotation-driven, synchronous transaction support using the MongoTransactionManager but also bits for reactive transactions built on top of MongoDB's ReactiveStreams driver and Project Reactor datatypes exposed via the ReactiveMongoTemplate.

Pieter Humphrey - Spring Product Marketing Lead, Pivotal

The transaction block code snippets below compare the MongoDB syntax with that used by MySQL. It shows how multi-document transactions feel familiar to anyone who has used traditional relational databases in the past.

MySQL

    db.start_transaction()
        cursor.execute(orderInsert, orderData)
        cursor.execute(stockUpdate, stockData)
    db.commit()

MongoDB

    s.start_transaction()
        orders.insert_one(order, session=s)
        stock.update_one(item, stockUpdate, session=s)
    s.commit_transaction()

Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions can apply to operations against multiple documents contained in one, or in many, collections and databases. The changes to MongoDB that enable multi-document transactions do not impact performance for workloads that don't require them.

During its execution, a transaction is able to read its own uncommitted writes, but none of its uncommitted writes will be seen by other operations outside of the transaction. Uncommitted writes are not replicated to secondary nodes until the transaction is committed to the database. Once the transaction has been committed, it is replicated and applied atomically to all secondary replicas.

An application specifies write concern in the transaction options to state how many nodes should commit the changes before the server acknowledges the success to the client. All uncommitted writes live on the primary exclusively.

Taking advantage of the transactions infrastructure introduced in MongoDB 4.0, the new snapshot read concern ensures queries and aggregations executed within a read-only transaction will operate against a single, isolated snapshot on the primary replica. As a result, a consistent view of the data is returned to the client, irrespective of whether that data is being simultaneously modified by concurrent operations. Snapshot reads are especially useful for operations that return data in batches with the getMore command.

Even before MongoDB 4.0, typical MongoDB queries leveraged WiredTiger snapshots. The distinction between typical MongoDB queries and snapshot reads in transactions is that snapshot reads use the same snapshot throughout the duration of the query. Whereas typical MongoDB queries may switch to a more current snapshot during yield points.

Snapshot Isolation and Write Conflicts

When modifying a document, a transaction locks the document from additional changes until the transaction completes. If a transaction is unable to obtain a lock on a document it is attempting to modify, likely because another transaction is already holding the lock, it will immediately abort after 5ms with a write conflict.

However, if a typical MongoDB write outside of a multi-document transaction tries to modify a document that is currently being held by a multi-document transaction, that write will block behind the transaction completing. The typical MongoDB write will be infinitely retried with backoff logic until $maxTimeMS is reached. Even prior to 4.0, all writes were represented as WiredTiger transactions at the storage layer and it was possible to get write conflicts. This same logic was implemented to avoid users having to react to write conflicts client-side.

Reads do not require the same locks that document modifications do. Documents that have uncommitted writes by a transaction can still be read by other operations, and of course those operations will only see the committed values, and not the uncommitted state.

Only writes trigger write conflicts within MongoDB. Reads, in line with the expected behavior of snapshot isolation, do not prevent a document from being modified by other operations. Changes will not be surfaced as write conflicts unless a write is performed on the document. Additionally, no-op updates – like setting a field to the same value that it already had – can be optimized away before reaching the storage engine, thus not triggering a write conflict. To guarantee that a write conflict is detected, perform an operation like incrementing a counter.

Retrying Transactions

MongoDB 4.0 introduces the concept of error labels. The transient transaction error label notifies listening applications that the error surfaced, ranging from network errors to write conflicts, that the error may be temporary, and that the transaction is safe to retry from the beginning. Permanent errors, like parsing errors, will not have the transient transaction error label, as rerunning the transaction will never result in a successful commit.

One of the core values of MongoDB is its highly available architecture. These error labels make it easy for applications to be resilient in cases of network blips or node failures, enabling cross document transactional guarantees without sacrificing use cases that must be always-on.

Transactions Best Practices

As noted earlier, MongoDB’s single document atomicity guarantees will meet 80-90% of an application’s transactional needs. They remain the recommended way of enforcing your app’s data integrity requirements. For those operations that do require multi-document transactions, there are several best practices that developers should observe.

Creating long running transactions, or attempting to perform an excessive number of operations in a single ACID transaction can result in high pressure on WiredTiger’s cache. This is because the cache must maintain state for all subsequent writes since the oldest snapshot was created. As a transaction uses the same snapshot while it is running, new writes accumulate in the cache throughout the duration of the transaction. These writes cannot be flushed until transactions currently running on old snapshots commit or abort, at which time the transactions release their locks and WiredTiger can evict the snapshot. To maintain predictable levels of database performance, developers should therefore consider the following:

  1. By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time. To address timeouts, the transaction should be broken into smaller parts that allow execution within the configured time limit. You should also ensure your query patterns are properly optimized with the appropriate index coverage to allow fast data access within the transaction.
  2. There are no hard limits to the number of documents that can be read within a transaction. As a best practice, no more than 1,000 documents should be modified within a transaction. For operations that need to modify more than 1,000 documents, developers should break the transaction into separate parts that process documents in batches.
  3. In MongoDB 4.0, a transaction is represented in a single oplog entry, therefore must be within the 16MB document size limit. While an update operation only stores the deltas of the update (i.e., what has changed), an insert will store the entire document. As a result, the combination of oplog descriptions for all statements in the transaction must be less than 16MB. If this limit is exceeded, the transaction will be aborted and fully rolled back. The transaction should therefore be decomposed into a smaller set of operations that can be represented in 16MB or less.
  4. When a transaction aborts, an exception is returned to the driver and the transaction is fully rolled back. Developers should add application logic that can catch and retry a transaction that aborts due to temporary exceptions, such as a transient network failure or a primary replica election. With retryable writes, the MongoDB drivers will automatically retry the commit statement of the transaction.
  5. DDL operations, like creating an index or dropping a database, block behind active running transactions on the namespace. All transactions that try to newly access the namespace while DDL operations are pending, will not be able to obtain locks, aborting the new transactions.
  6. You can review all best practices in the MongoDB documentation for multi-document transactions.

    Transactions and Their Impact to Data Modeling in MongoDB

    Adding transactions does not make MongoDB a relational database – many developers have already experienced that the document model is superior to the relational one today.

    All best practices relating to MongoDB data modeling continue to apply when using features such as multi-document transactions, or fully expressive JOINs (via the $lookup aggregation pipeline stage). Where practical, all data relating to an entity should be stored in a single, rich document structure. Just moving data structured for relational tables into MongoDB will not allow users to take advantage of MongoDB’s natural, fast, and flexible document model, or its distributed systems architecture.

    The RDBMS to MongoDB Migration Guidedescribes the best practices for moving an application from a relational database to MongoDB.

    The Path to Transactions

    Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

    Figure 6 presents a timeline of the key engineering projects that have enabled multi-document transactions in MongoDB, with status shown as of June 2018. The key design goal underlying all of these projects is that their implementation does not sacrifice the key benefits of MongoDB – the power of the document model and the advantages of distributed systems, while imposing no performance impact to workloads that don’t require multi-document transactions.

    Figure 6: The path to transactions – multi-year engineering investment, delivered across multiple releases

    Conclusion

    MongoDB has already established itself as the leading database for modern applications. The document data model is rich, natural, and flexible, with documents accessed by idiomatic drivers, enabling developers to build apps 3-5x faster. Its distributed systems architecture enables you to handle more data, place it where users need it, and maintain always-on availability. MongoDB’s existing atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. The addition of multi-document ACID transactions in MongoDB 4.0 makes it easier than ever for developers to address a complete range of us cases, while for many, simply knowing that they are available will provide critical peace of mind.

    Take a look at our multi-document transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

    If you want to learn more about everything that’s new in MongoDB 4.0, download our Guide.

    Transactional guarantees have been a critical feature for relational databases for decades, but have typically been absent from non-relational alternatives, which has meant that users have been forced to choose between transactions and the flexibility and versatility that non-relational databases offer. With its support for multi-document ACID transactions, MongoDB is built for customers that want to have their cake and eat it too.

    Stephen O’Grady, Principal Analyst, Redmonk

    *Safe Harbour Statement

    This blog contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

    In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

MongoDB 4.0 Release Candidate 0 Has Landed

MongoDB enables you to meet the demands of modern apps with a technology foundation that enables you through:

  1. The document data model – presenting you the best way to work with data.
  2. A distributed systems design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – future-proofing your work and eliminating vendor lock-in.

Building on the foundations above, MongoDB 4.0 is a significant milestone in the evolution of MongoDB, and we’ve just shipped the first Release Candidate (RC), ready for you to test.

Why is it so significant? Let’s take a quick tour of the key new features. And remember, you can learn about all of this and much more at MongoDB World'18 (June 26-27).

Multi-Document ACID Transactions

Previewed back in February, multi-document ACID transactions are part of the 4.0 RC. With snapshot isolation and all-or-nothing execution, transactions extend MongoDB ACID data integrity guarantees to multiple statements and multiple documents across one or many collections. They feel just like the transactions you are familiar with from relational databases, are easy to add to any application that needs them, and and don't change the way non-transactional operations are performed. With multi-document transactions it’s easier than ever for all developers to address a complete range of use cases with MongoDB, while for many of them, simply knowing that they are available will provide critical peace of mind that they can meet any requirement in the future. In MongoDB 4.0 transactions work within a replica set, and MongoDB 4.2 will support transactions across a sharded cluster*.

To give you a flavor of what multi-document transactions look like, here is a Python code snippet of the transactions API.

with client.start_session() as s:
    s.start_transaction():
    try:
        collection.insert_one(doc1, session=s)
        collection.insert_one(doc2, session=s)
    except:
        s.abort_transaction()
        raise
    s.commit_transaction()

And now, the transactions API for Java.

try (ClientSession clientSession = client.startSession()) {
          clientSession.startTransaction();
           try {
                   collection.insertOne(clientSession, docOne);
                   collection.insertOne(clientSession, docTwo);
                   clientSession.commitTransaction();
          } catch (Exception e) {
                   clientSession.abortTransaction();
           }
    }

Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to replica sets, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

Take a look at our multi-document ACID transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started.

Aggregation Pipeline Type Conversions

One of the major advantages of MongoDB over rigid tabular databases is its flexible data model. Data can be written to the database without first having to predefine its structure. This helps you to build apps faster and respond easily to rapidly evolving application changes. It is also essential in supporting initiatives such as single customer view or operational data lakes to support real-time analytics where data is ingested from multiple sources. Of course, with MongoDB’s schema validation, this flexibility is fully tunable, enabling you to enforce strict controls on data structure, type, and content when you need more control.

So while MongoDB makes it easy to ingest data without complex cleansing of individual fields, it means working with this data can be more difficult when a consuming application expects uniform data types for specific fields across all documents. Handling different data types pushes more complexity to the application, and available ETL tools have provided only limited support for transformations. With MongoDB 4.0, you can maintain all of the advantages of a flexible data model, while prepping data within the database itself for downstream processes.

The new $convert operator enables the aggregation pipeline to transform mixed data types into standardized formats natively within the database. Ingested data can be cast into a standardized, cleansed format and exposed to multiple consuming applications – such as the MongoDB BI and Spark connectors for high-performance visualizations, advanced analytics and machine learning algorithms, or directly to a UI. Casting data into cleansed types makes it easier for your apps to to process, sort, and compare data. For example, financial data inserted as a long can be converted into a decimal, enabling lossless and high precision processing. Similarly, dates inserted as strings can be transformed into the native date type.

When $convert is combined with over 100 different operators available as part of the MongoDB aggregation pipeline, you can reshape, transform, and cleanse your documents without having to incur the complexity, fragility, and latency of running data through external ETL processes.

Non-Blocking Secondary Reads

To ensure that reads can never return data that is not in the same causal order as the primary replica, MongoDB blocks readers while oplog entries are applied in batches to the secondary. This can cause secondary reads to have variable latency, which becomes more pronounced when the cluster is serving write-intensive workloads. Why does MongoDB need to block secondary reads? When you apply a sequence of writes to a document, then MongoDB is designed so that each of the nodes must show the writes in the same causal order. So if you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems suffer from this behavior, but MongoDB does not, and never has.

By taking advantage of storage engine timestamps and snapshots implemented for multi-document ACID transactions, secondary reads in MongoDB 4.0 become non-blocking. With non-blocking secondary reads, you now get predictable, low read latencies and increased throughput from the replica set, while maintaining a consistent view of data. Workloads that see the greatest benefits are those where data is batch loaded to the database, and those where distributed clients are accessing low latency local replicas that are geographically remote from the primary replica.

40% Faster Data Migrations

Very few of today’s workloads are static. For example, the launch of a new product or game, or seasonal reporting cycles can drive sudden spikes in load that can bring a database to its knees unless additional capacity can be quickly provisioned. If and when demand subsides, you should be able to scale your cluster back in, rightsizing for capacity and cost.

To respond to these fluctuations in demand, MongoDB enables you to elastically add and remove nodes from a sharded cluster in real time, automatically rebalancing the data across nodes in response. The sharded cluster balancer, responsible for evenly distributing data across the cluster, has been significantly improved in MongoDB 4.0. By concurrently fetching and applying documents, shards can complete chunk migrations up to 40% faster, allowing you to more quickly bring new nodes into service at just the moment they are needed, and scale back down when load returns to normal levels.

Extensions to Change Streams

Change streams, released with MongoDB 3.6, enable developers to build reactive, real-time, web, mobile, and IoT apps that can view, filter, and act on data changes as they occur in the database. Change streams enable seamless data movement across distributed database and application estates, making it simple to stream data changes and trigger actions wherever they are needed, using a fully reactive programming style.

With MongoDB 4.0, Change Streams can now be configured to track changes across an entire database or whole cluster. Additionally, change streams will now return a cluster time associated with an event, which can be used by the application to provide an associated wall clock time for the event.

Getting Started with MongoDB 4.0

Hopefully this gives you a taste of what’s coming in 4.0. There’s a stack of other stuff we haven’t covered today, but you can learn about it all in the resources below.

To get started with the RC now:

  1. Head over to the MongoDB download center to pick up the latest development build.
  2. Review the 4.0 release notes.
  3. Sign up for the forthcoming MongoDB University training on 4.0.

And you can meet our engineering team and other MongoDB users at MongoDB World'18 (June 26-27).

---

* Safe Harbor Statement

This blog post contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

BookMyShow Continues to Lead Online Entertainment Ticketing in India and Scales to 25 Million Users with MongoDB

India's twin passions for cinema and tech make it a natural fit for automated ticketing. But if ever a market needs scalable solutions, this 1.4 billion-strong nation is it.

That’s a lesson Viraj Patel, VP Technology for BigTree Entertainment, learned the hard way. "We started out in ticketing distribution in 1999 using telephones," he says, "before mobile platforms and internet access were on the scene. It just didn't work. The investors pulled the plug in 2002.”

Undeterred, the company successfully pivoted to selling software to cinema chains. By 2006, Viraj and team were ready to aim for the big prize again. They just needed the right tools. With the internet and mobile data fitting into place, a trial project in online ticket aggregation looked promising enough for investors to fund the launch of BookMyShow in 2007.

“We launched with a 100 percent Microsoft stack,” says Viraj, “but soon realized that scaling with Microsoft was not an easy job.” It wasn’t the Windows platform or the developer tools that were the problem, he recalls: “It was the SQL Server database. That was the first bottleneck as we got more and more traffic, and it soaked up more and more resources and money. It wasn’t the right solution. It couldn’t scale with us.”

Spoiler: By 2018, BookMyShow, each month, sells more than 10 million tickets for all manner of movies and events and serves three billion pages a month across the web and its 50 million plus installed apps. Scaling happened.

The plot changed for the better in 2010 with the discovery of MongoDB. “We were looking around for alternatives, and it was the new kid on the block.” (In fact, MongoDB 1.0 had launched just the year before, and MongoDB India was yet to come.) “We tested it internally as a straight distributed database for monolithic SQL database swap. Every web and mobile application we built needed a database that had performance and scalability, and MongoDB blew us away on both.”

MongoDB really won its spurs when the company added Facebook Connect to its registration process. “The registration database was the first thing we built, and it was running on SQL Server. Which was OK, until Facebook Connect came along and we added that as a registration option. Then the database really struggled. We switched to MongoDB and it was night and day. Tremendous gains. Not only did we get the ability to represent customers directly as JSON documents in the database, which made our data model much simpler, but we got all our performance back.

“We want the flexibility of upgrading the schema for future use cases, and that’s so much easier in MongoDB. The data structures we create are clear and easy to read, and it’s so much simpler to understand and extend,” Viraj adds, about their discovery of the advantages of document-model storage.

MongoDB’s second big job was also thoroughly web scale, as it took on the task of giving each of those millions of users their own bespoke, personalized view of the service. This time, the engineering team knew where to start. “About five years ago, we built our personalization engine on MongoDB,” says Viraj, “and it continues to scale with us. It stores a lot of customer information and when a customer visits, it pulls it out, personalizes it in real time and delivers it. That really improves the customer experience. We see an 18 percent increase in conversion, personalized versus non-personalized.”

Today, MongoDB is the default database for developing ideas and services in BigTree, and Viraj cheerfully admits he has long ago stopped counting how many nodes are in use. “Last time I looked, it was between 100-160,” he says.

Future plans include containerization of the databases to smooth out upgrades and ease of deployment with BigTree’s agile DevOps production pipeline and, when the time comes, sharding the customer database. That’s planned for, but not currently necessary. He explains: “We just haven’t reached the point where writes to MongoDB are the limiting factor anywhere in the service. We get a long way with MongoDB replica sets, and are safe in the knowledge that there are no limitations to scaling further when we need to.”

Viraj cares deeply about latency – “We’re a performance-sensitive company” – and much of the service is instrumented by monitoring and management platforms such as New Relic. While initial performance gains were superlative, he says, things have only continued to improve as new features and technologies have been added. “We had been using SQL tabular databases for customer booking history,” says Viraj. “We moved this to MongoDB and have seen a superb performance boost. What used to take up to 5000 ms on traditional SQL databases went down to 10-20 ms on MongoDB using the MMAP storage engine. When we moved to MongoDB’s default WiredTiger storage engine, it improved five to ten times further, to 2ms. We’re still getting this performance, even though the database now has close to 200 million documents.”

There have been other benefits from following MongoDB’s roadmap. “WiredTiger has made things much more cost-effective,” he says. “Security is better as we now encrypt data instead of storing it in plain JSON. Our customer database is five times more compact and our personalization database uses nearly eight times less storage.”

In the future, he says, they expect aggregation queries and query caching mechanisms will improve performance still more. As for reliability, “MongoDB auto-heals so well in the event of any failures in our platform we don’t even need to worry about it. That’s highly appreciated, and much better than any of the other databases we have used.”

There can be few better stories of early adoption and innovation with MongoDB than the success BigTree Entertainment has enjoyed with BookMyShow. Viraj and his engineers insist on picking the right tools for each part of the job running India’s favourite online ticketing service, their long experience of casting this particular actor in so many roles makes MongoDB a performer they’ve come to rely on.


Read more about what others are building with MongoDB.

What’s New in MongoDB 3.6. Part 4 – Avoid Lock-In, Run Anywhere

Mat Keep

Welcome to part 4 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • In part 3 we covered what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Run Anywhere

Many organizations are turning to the cloud to accelerate the speed of application development, deployment, and data discovery. Replatforming to the cloud gives them the ability to enable self-service IT, to elastically scale resources on demand, and to align costs to actual consumption. But they are also concerned about exposing the business to deeper levels of lock-in – this time from the APIs and services of the cloud providers themselves.

Increasingly, users are demanding the freedom to run anywhere: private clouds in their own data center, in the public cloud, or in a hybrid model that combines the two. This flexibility is not available when they build on a cloud-proprietary database from a single vendor. Alternatively, the platform independence provided by MongoDB gives them the ability to respond to business or regulatory changes without incurring the complexity, risk, and time that comes from expensive database migrations whenever they need or want to transition to a new platform.

MongoDB Atlas

As a fully managed database service, MongoDB Atlas is the best way to run MongoDB in the public cloud. 2017 has already seen major evolutions in the Atlas service, with key highlights including:

  • Expansion beyond Amazon Web Services (AWS) to offer Atlas on Google Cloud Platform (GCP) and Microsoft Azure.
  • Achieving SOC2 Type 1 compliance.
  • The launch of managed database clusters on a shared architecture, including the free M0 instances, and the M2s and M5s, which allow customers to jumpstart their projects for a low and predictable price.
  • A live migration facility to move data from an existing MongoDB replica set into an Atlas cluster with minimal application impact.
  • The addition of the Data Explorer and Real Time Performance Panel, now coming to Ops Manager, as discussed above.

MongoDB 3.6 is available as a fully managed service on Atlas, along with important new features to support global applications, and with automated scalability and performance optimizations.

Turnkey Global Distribution of Clusters with Cross-Region Replication

MongoDB Atlas clusters can now span multiple regions offered by a cloud provider. This enables developers to build apps that maintain continuous availability in the event of geographic outages, and improve customer experience by locating data closer to users.

When creating a cluster or modifying its configuration, two options are now available:

  • Teams can now deploy a single MongoDB database across multiple regions supported by a cloud provider for improved availability guarantees. Reads and writes will default to a “preferred region” assuming that there are no active failure or failover conditions. The nearest read preference, discussed below, can be used to route queries to local replicas in a globally distributed cluster. Replica set members in additional regions will participate in the automated election and failover process if the primary member is affected by a local outage, and can become a primary in the unlikely event that the preferred region is offline.

  • Read-only replica set members can be deployed in multiple regions, allowing teams to optimize their deployments to achieve reduced read latency for a global audience. Read preference – providing a mechanism to control how MongoDB routes read operations across members of a replica set – can be configured using the drivers. For example, the nearest read preference routes queries to replicas with the lowest network latency from the client, thus providing session locality by minimizing the effects of geographic latency. As the name suggests, read-only replica set members will not participate in the automated election and failover process, and can never be become a primary.

Teams can activate both of the options outlined above in a single database to provide continuous availability and an optimal experience for their users.

Figure 1: Globally distributed MongoDB Atlas cluster, providing resilience to regional outages and lower latency experiences for global apps

Auto-Scaling Storage and Performance Optimization

MongoDB Atlas now supports automatic scaling for the storage associated with a cluster, making it easier for you to manage capacity. Enabled by default, auto-scaling for storage detects when your disks hit 90% utilization and provisions additional storage such that your cluster reaches a disk utilization of 70% on AWS & GCP, or a maximum of 70% utilization on Azure. This automated process occurs without impact to your database or application availability.

In addition to auto-storage scaling, the new Performance Advisor discussed earlier for Ops Manager is also available in MongoDB Atlas, providing you with always-on, data-driven insights into query behavior and index recommendations.

A Cloud Database Platform for Development & Testing

New enhancements to MongoDB Atlas make it the optimal cloud database for spinning up and running test and development environments efficiently.

  • You can now pause your MongoDB Atlas cluster, perfect for use cases where only intermittent access to your data is required, such as development during business hours or temporary testing. While your database instances are stopped, you are charged for provisioned storage and backup storage, but not for instance hours. You can restart your MongoDB Atlas cluster at any time on demand; your cluster configuration will be the same as when you stopped it and public DNS hostnames are retained so no modifications to your connection string are required. MongoDB Atlas clusters can be stopped for up to 7 days. If you do not start your cluster after 7 days, Atlas will automatically start your cluster. Pausing and restarting your MongoDB clusters can be triggered in the MongoDB Atlas UI or via the REST API.
  • Cross-project restores, introduced with Ops Manager 3.6, are also available in MongoDB Atlas, allowing users to restore to different MongoDB Atlas projects than the backup snapshot source.

Next Steps

That wraps up the final part of our what’s new blog series. I hope I’ve helped demonstrate how MongoDB 3.6 helps you move at the speed of your data. It enables new digital initiatives and modernized applications to be delivered to market faster, running reliably and securely at scale, and unlocking insights and intelligence ahead of your competitors.

  • Change streams, retryable writes, causal consistency, greater query and update expressivity, and Compass Community help developers move faster.
  • Ops Manager, schema validation, enhanced security, end to end compression, and user session management help operations teams scale faster.
  • The MongoDB aggregation pipeline, Connector for BI, and the recommended R driver help analysts and data scientists unlock insights faster.

And you have the freedom to run MongoDB anywhere – on-premises, public cloud, and as a service with MongoDB Atlas available on AWS, Azure, and GCP.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then:

What’s New in MongoDB 3.6. Part 3 – Speed to Insight

Mat Keep

Welcome to part 3 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation.
  • In part 2, we dived into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression.
  • In today’s part 3 we’ll cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R.
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to What's New in MongoDB 3.6.

Speed to Insight

How quickly an organization can unlock and act on insights from data generated by new applications has become a material source of competitive advantage. Collecting data in operational systems and then relying on batch ETL (Extract, Transform, Load) processes to update an expensive data warehouse or complex and ungoverned data lake is no longer sufficient. Speed to insight is critical, and so analytics performed against live data to drive operational intelligence is fast becoming a necessity, without having to employ armies of highly skilled and scarce data engineers and scientists.

MongoDB 3.6 delivers a number of new features and capabilities that allow organizations to enable real-time analytics and action.

MongoDB Connector for BI: Faster and Simpler

MongoDB 3.6 brings a number of performance and ease-of-use enhancements to the BI Connector, enabling faster time to insight using SQL-based BI and Analytics platforms.

Faster The connector takes advantage of enhancements to the aggregation pipeline – discussed later on this post – to deliver higher performance, with more operations pushed natively to the database. Prior to MongoDB 3.6, only left outer equijoins could be pushed down to the database – all other JOIN types had to be executed within the BI connector layer, which firstly required all matching data to be extracted from the database. With MongoDB 3.6, support is being extended to non-equijoins and the equivalent of SQL subqueries. These enhancements will reduce the amount of data that needs to be moved and computed in the BI layer, providing faster time to insight.

In addition, performance metrics are now observable via the Show Status function, enabling deeper performance insights and optimizations.

Simpler To support easier configuration, the mongosqld process now samples and maps the MongoDB schema, caching the results internally and eliminating the need to install the separate mongodrdl component. Additionally, users can simplify lifecycle management by configuring, deploying, and monitoring the BI connector directly from Ops Manager.

To simplify the enforcement of access controls, BI Connector users can now be authenticated directly against MongoDB using new client-side plugins, eliminating the need to manage TLS certificates. Review the documentation for the C and JDBC authentication plugins to learn more. Authentication via Kerberos is also now supported.

Richer Aggregation Pipeline

Developers and data scientists rely on the MongoDB aggregation pipeline for its power and flexibility in enabling sophisticated data processing and manipulation demanded by real-time analytics and data transformations. Enhancements in the aggregation pipeline unlock new use cases.

A more powerful $lookup operator extends MongoDB’s JOIN capability to support the equivalent of SQL subqueries and non-equijoins. As a result, developers and analysts can write more expressive queries combining data from multiple collections, all executed natively in the database for higher performance, and with less application-side code.

In addition to $lookup, the aggregation pipeline offers additional enhancements:

  • Support for timezone-aware aggregations. Before timezone awareness, reporting that spanned regions and date boundaries was not possible within the aggregation pipeline. Now business analysts can group data for multi-region analysis that takes account of variances in working hours and working days across different geographic regions.
  • New expressions allow richer data transformations within the aggregation pipeline, including the ability to convert objects to arrays of key-value pairs, and arrays of key-value pairs to be converted to objects. The mergeObjects expression is useful for setting missing fields into default values, while the REMOVE variable allows the conditional exclusion of fields from projections based on evaluation criteria. You can learn more about the enhancements from the documentation.

More Expressive Query Language

MongoDB 3.6 exposes the ability to use aggregation expressions within the query language to enable richer queries with less client-side code. This enhancement allows the referencing of other fields in the same document when executing comparison queries, as well as powerful expressions such as multiple JOIN conditions and uncorrelated subqueries. The addition of the new expression operator allows the equivalent of SELECT * FROM T1 WHERE a>b in SQL syntax. Learn more from the $expr documentation.

R Driver for MongoDB

A recommended R driver for MongoDB is now available, enabling developers, data scientists, and statisticians to get the same first-class experience with MongoDB as that offered by the other MongoDB drivers – providing idiomatic, native language access to the database. The driver supports advanced MongoDB functionality, including:

  • Read and write concerns to control data consistency and durability.
  • Enterprise authentication mechanisms, such as LDAP and Kerberos, to enforce security controls against the database.
  • Support for advanced BSON data types such as Decimal 128 to support high precision scientific and financial analysis.

Next Steps

That wraps up the third part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to What’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then: