GIANT Stories at MongoDB

Introducing the Aggregation Pipeline Builder in MongoDB Compass

Building MongoDB aggregations has never been so easy.

The most efficient way to analyze your data is where it already lives. That’s why we have MongoDB’s built-in aggregation framework. Have you tried it yet? If so, you know that it’s one of the most powerful MongoDB tools at your disposal. If not, you’re missing out on the ability to query your data in incredibly powerful ways. In fact, we like to say that “aggregate is the new find”. Built on the concept of data processing pipelines (like in Unix or PowerShell), the aggregation framework lets users “funnel” their documents through a multi-stage pipeline that filters, transforms, sorts, computes, aggregates your data, and more. The aggregation framework enables you to perform extensive analytics and statistical analysis in real time and generate pre-aggregated reports for dashboarding.

There are no limits to the number of stages an aggregation pipeline can have – pipelines can be as simple or as complex as you wish. In fact, the only limit is one’s imagination when it comes to deciding how to aggregate data. We’ve seen some very comprehensive pipelines!

With a rich library of over 25 stages and 100 operators (and growing with every release), the aggregation framework is an amazingly versatile tool. To help you be even more successful with it, we decided to build an aggregation construction user interface. The new Aggregation Pipeline Builder is now available with the latest release of Compass for beta testing. It’s available under the Aggregations tab.

The screenshot below depicts a sample pipeline on a movies collection that produces a listing of the title, year, and rating of all movies except for crime or horror, in English and Japanese which are rated either PG or G, starting with most recent, and sorted alphabetically within each year. Each stage was added gradually, with an ability to preview the result of our aggregation.

This easy-to-use UI lets you build your aggregation queries faster than ever before. There’s no need to worry about bracket matching, reordering stages, or remembering operator syntax with its intuitive drag-and-drop experience and code skeletons. You also get auto-completion for aggregation operators as well as query operators and even document field names.

If you need help understanding a particular operator, click on the info icon next to it and you’ll be taken directly to the appropriate guidance.

As you are building your pipeline, you can easily preview your results. This, in combination with an ability to rearrange and toggle stages on and off, makes it easy to troubleshoot your pipelines. When you are satisfied with the results, the constructed pipeline can be copied to the clipboard for easy pasting in your code, or simply saved in your favorites list for re-use later!

The aggregation authoring experience just got even more incredible with the new Compass aggregation pipeline builder. Why not check it out today?

  • Download the latest beta version of Compass
  • See the documentation for the aggregation pipeline builder in Compass
  • See the aggregation framework quick reference
  • To learn or brush up your aggregation framework skills, take M121 from our MongoDB University – it’s well worth it!

Also, please remember to send us your feedback by filing JIRA tickets or emailing it to:

MongoDB’s Drive to Multi-Document Transactions

Transactions are important. Any database needs to offer transactional guarantees to enforce data integrity. But they don’t all do it in the same way – different database technologies take different approaches:

  • Relational databases model an entity’s data across multiple rows and parent-child tables, and so transactions need to span those rows and tables.
  • With subdocuments and arrays, document databases allow related data to be unified hierarchically inside a single data structure. The document can be updated with an atomic operation, giving it the same data integrity guarantees as a multi-table transaction in a relational database.

Because of this fundamental difference in data modeling, MongoDB’s existing atomicity guarantees are able to meet the data integrity needs of most applications. In fact, we estimate 80%-90% of applications don’t need multi-document transactions at all. However, there are some legitimate use cases and workloads where transactions across multiple documents are needed. In those cases, without transactions, a developer would have to implement complex logic on their own in the application layer. Also, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-table/document transactions are a requirement for any database, irrespective of the data model they are built upon. Others are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future and they don’t want to outgrow their database.

And so, the addition of multi-document ACID transactions makes it easier than ever for developers to address a complete range of use-cases on MongoDB.

As one can imagine, multi-document transactions are a much more complex thing to build in a distributed database than in a monolithic, scale-up database. In fact, we have been working on bringing multi-document transactions to MongoDB as part of a massive multi-year engineering investment. We have made enhancements to practically every part of the system – the storage layer itself, our replication consensus protocol, sharding architecture, consistency and durability guarantees, the introduction of a global logical clock, and refactored cluster metadata management and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers.

The figure below represents the evolution of these enhancements as well as the work in progress to enable multi-document transactions. As you can see, we are nearly done.

In MongoDB 4.0, coming in summer 2018*, multi-document transactions will work across a replica set. We will extend support for transactions across a sharded deployment in the following release.

Importantly, the green boxes highlight all of the critical dependencies to transactions that have already been delivered over the past 3 years. And, frankly, that was the hardest part of the project – how to balance building the stepping stones we needed to get to transactions with delivering useful features to our users straightaway to improve their development experience along this journey. Wherever we could, we built components that suited both goals. For example, the introduction of the global logical clock and timestamps in the storage layer enforces consistent time across every operation in a distributed cluster. These enhancements are needed for transactions in order to provide snapshot isolation, but they also allowed us to implement change stream resumability and causal consistency in MongoDB 3.6, which are immediately valuable on their own. Change streams enable developers to build reactive applications that can view, filter, and act on data changes as they occur in the database in real-time, and recover from transient failures. Causal consistency allows developers to maintain the benefits of strong data consistency with “read your own write” guarantees, while taking advantage of scalability and availability of our intelligent distributed data platform.

The global logical clock is just one example. A selection of other key enhancements along the way illustrates how our engineering team deliberately laid the groundwork for transactions in such a way that we consistently surfaced additional benefits to our users:

  • The acquisition of WiredTiger Inc. and integration of its storage engine way back in MongoDB 3.0 brought massive scalability gains with document level concurrency control and compression to MongoDB. And with MVCC support, it also provided the storage layer foundations for transactions coming in MongoDB 4.0.
  • In MongoDB 3.2, the enhanced consensus protocol allowed for faster and more deterministic recovery from the failure or network partition of the primary replica set member, along with stricter durability guarantees for writes. These enhancements were immediately useful to MongoDB users then, and they are also essential capabilities for transactions.
  • The introduction of readConcern in 3.2 allowed applications to specify read isolation level on a per operation basis, providing powerful and granular consistency controls.
  • Logical sessions in MongoDB 3.6 gave our users causal consistency and retryable writes, but as a foundation for transactions, they provide MongoDB the ability to coordinate client and server operations across the nodes of a distributed cluster, managing the execution context for each statement in a transaction.
  • Similarly, retryable writes, implemented in MongoDB 3.6, simplify the development of applications in the face of elections (or other transient failures) while the server enforces at most once processing semantics.
  • Replica set point in time reads in 4.0 are essential for transactional consistency, but it’s also highly valuable to regular read operations that don’t need to be executed in a transaction. With this feature, reads will only show a view of the data that is consistent at the point the find() operation starts, irrespective of which replica serves the read, or what data has been modified by concurrent operations.

The number of remaining pieces on the roadmap to transactions is small. Once complete, multi-document distributed transactions will provide a globally consistent view of data (both in replica set and sharded deployments) through snapshot isolation and maintain all-or-nothing guarantees in cases of node failures. This will greatly simplify your application code. After all, MongoDB’s job is to take hard problems and solve them for as many developers as possible, so that you can focus on adding value to your applications and not dealing with the underlying plumbing.

We’re really excited about the release of multi-document transactions, and what they will allow you to build with MongoDB going forward. You should view our multi-document transactions page to learn more, and we invite you to sign up for the beta program so that you can start to put all of the work we’ve done through its paces.

* Safe Harbour Statement

This post contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.