GIANT Stories at MongoDB

Working with the Stitch CLI – Editing Apps Outside of Stitch

Andrew Morgan
August 08, 2018
Technical

Introduction

This post introduces the MongoDB Stitch CLI, demonstrating how the Command Line Interface (CLI) can be used to export an existing app, continue to develop it locally, and then merge the changes back into Stitch. This workflow allows automation, enables collaboration, and increases productivity.

Introducing MongoDB Stitch

MongoDB Stitch is a serverless platform which accelerates application development with simple, secure access to data and services from the client – getting your apps to market faster while reducing operational costs and effort.

The Stitch serverless platform addresses the challenges faced by the developers of modern applications by providing four services:

  • Stitch QueryAnywhere: Exposes the full power of working with documents in MongoDB and the MongoDB query language, directly from your web and mobile application frontend code. A powerful rules engine lets developers declare fine-grained security policies.
  • Stitch Functions: Allows developers to run simple JavaScript functions in Stitch’s serverless environment, making it easy to create secure APIs or to build integrations with microservices and server-side logic. Enables integration with popular cloud services such as Slack and Twilio, enriching your apps with a single Stitch method call.
  • Stitch Triggers: Real-time notifications that launch functions in response to changes in the database. The functions can make further database changes, push data to other places, or interact with users – such as through push notifications, text messages, or emails.
  • Stitch Mobile Sync: Automatically synchronizes data between documents held locally in MongoDB Mobile and the backend database. MongoDB Mobile allows mobile developers to use the full power of MongoDB locally. Stitch Mobile Sync ensures that data is kept up to date across phones and all other clients in real time.

The Sitch CLI

Stitch has an intuitive UI to create and maintain your apps, but you also have the option of using the Stitch CLI. The stitch-cli is the key to automating workflows with Stitch (it also lets you develop code inside your favorite editor). The CLI provides functions for authenticating Stitch users, importing local application directories, and exporting applications from the MongoDB Stitch service.

With the import & export commands in the Stitch CLI, you can:

  • Develop Stitch applications using your favorite desktop editor and import them into Stitch
  • Export existing applications to a local directory. From there you can:
    • Add the files to source control (e.g., GitHub)
    • Make changes to the application code and configuration files, before merging those changes back into your Stitch app
    • Let other developers review and contribute code
  • Easily share custom services and functions between Stitch apps
  • Push code from development environments, to QA, and onto production.

stitch-cli currently includes 5 commands:

  • login: Log in using an Atlas API Key
  • logout: Deauthenticate as an administrator
  • whoami: Display Current User Info
  • export: Export a stitch application to a local directory
  • import: Import and deploy a stitch application from a local directory

While MongoDB Stitch lets you create complex applications, it also simplifies development by mapping your apps to standard JSON and JavaScript files and structures in a way that's easy to navigate.

An exported application is made up of a hierarchy of directories and files:

  • *.json: Files containing configuration data
  • *.js: Files containing JavaScript code

This is an example hierarchy for a Stitch IoT service that works with the backend MongoDB Atlas database and the REST APIs of 2 other cloud services – IFTTT and DarkSky.net:

impTemp
├── auth_providers
│   ├── anon-user.json
│   └── api-key.json
├── functions
│   ├── Imp_Write
│   │   ├── config.json
│   │   └── source.js
│   ├── TempTimeline
│   │   ├── config.json
│   │   └── source.js
│   └── controlHumidity
│       ├── config.json
│       └── source.js
├── services
│   ├── IFTTT
│   │   ├── config.json
│   │   └── rules
│   │       └── IFTTT.json
│   ├── darksky
│   │   ├── config.json
│   │   └── rules
│   │       └── darkSkyGET.json
│   └── mongodb-atlas
│       ├── config.json
│       └── rules
│           └── Imp.TempData.json
├── stitch.json
└── values
    ├── DarkSkyKey.json
    ├── DeviceLocation.json
    ├── MakerIFTTKey.json
    ├── maxHumidity.json
    └── minTemp.json

Installing stitch-cli

The simplest way is to install stitch-cli is as an NPM module:

npm install -g mongodb-stitch-cli 

Logging into the Atlas API

Before doing anything else, you need to create an Atlas API key & whitelist your local machine's IP Address, as shown here:

Creating an Atlas API key

You can now log into the admin API using stitch-cli, and confirm that you're logged in:

stitch-cli login --username=andrew.morgan --api-key=XXXXXXXX-XXX-4010-97a2-ad7c39d8cfb5

stitch-cli whoami
andrew.morgan [API Key: ********-****-****-****-ad7c39d8cfb5]

Export app, make local edits, and merge changes back into the same Stitch app

I noticed that I'm missing some error handling code in my functions. Rather than editing them all through the Stitch UI, I export the app so that I can work on the code in my favorite browser:

Export the Stitch app to the local file system:

stitch-cli export --app-id=imptemp-sobpa

I then add some extra code to handle cases where the MongoDB insert fails:

exports = function(data){

  //Get the current time
  var now = new Date();

  var darksky = context.services.get("darksky");
  var mongodb = context.services.get("mongodb-atlas");
  var TempData = mongodb.db("Imp").collection("TempData");

  // Fetch the current weather from darksky.net

  darksky.get({"url": "https://api.darksky.net/forecast/" + 
    context.values.get("DarkSkyKey") + '/' + 
    context.values.get("DeviceLocation") +
    "?exclude=minutely,hourly,daily,alerts,flags&units=auto"
  }).then(response => {
    var darkskyJSON = EJSON.parse(response.body.text()).currently;

    var status =
      {
        "Timestamp": now.getTime(),
        "Date": now,
        "Readings": data,
        "External": darkskyJSON,
      };
    status.Readings.light = (100*(data.light/65536));
    context.functions.execute("controlHumidity", data.temp, data.humid);
    TempData.insertOne(status).then(
      results => {
        console.log("Successfully wrote document to TempData");
      },
      error => {
        console.log("Error writing to TempData colletion: " + error);
      });
  });
};

I then apply the updates to the Stitch app (the CLI conveniently shows me the delta before I commit the merge):

cd impTemp
stitch-cli import --app-id=imptemp-sobpa --strategy=merge

--- functions/Imp_Write/source.js
+++ functions/Imp_Write/source.js
@@ -28,6 +28,9 @@
     TempData.insertOne(status).then(
       results => {
         console.log("Successfully wrote document to TempData");
+      },
+      error => {
+        console.log("Error writing to TempData collection: " + error);
       });
   });
 };

Please confirm the changes shown above: [y/n]: y
Successfully imported 'imptemp-sobpa'

You can then validate the changes through the UI:

Updated Stitch function, viewed through the UI

Summary

The Stitch CLI make it easier than ever to work with Stitch code – regardless of whether you’re an advanced user or just beginning. If you've created your first stitch app, try the Stitch CLI to start extending it – all from the comfort of your favorite editor. If not, try importing this example dashboard application.

Sign up for the Stitch free tier and see what you can create!

Using Power BI to Gain Insight Into your MongoDB Data

Seth Payne
July 11, 2018
Technical

We are happy to announce updates to the MongoDB Connector for Business Intelligence for use with Microsoft’s Power BI Desktop. Now, it is simpler than ever for Power BI users to access data stored in MongoDB and use Power BI’s powerful analytical and visualization tools to gain insights into the data, and then effectively share these insights with colleagues.

Within just a few minutes, you can expose MongoDB data directly to Power BI to begin creating meaningful charts, dashboards, and reports.

MongoDB as a Data Platform for Business Intelligence

As both MongoDB’s popularity and adoption continue to rapidly grow, organizations are choosing MongoDB as the data platform that supports a variety of applications where tabular, or relational, database systems had been used historically. MongoDB’s document data model presents the best way to work with data for business critical applications, while distributed systems design allows users to intelligently put data where they want it, and enable the freedom to run anywhere - on premise or in the cloud.

But data platform environments are rarely, if ever, homogenous. And, given both the longevity and large install base of enterprise RDBMS, it is common for MongoDB to be part of a larger ecosystem that includes a variety of data sources, many of which are tabular in nature. Because of the need to manage data from multiple systems, administrators are looking for ways to expose all their data to their non-technical business users in a consistent and friendly way, no matter where that data is physically stored.

The MongoDB BI Connector enables Power BI users to easily query, analyze and visualize data stored in MongoDB in the same manner as with other Power BI data sources. No knowledge of MongoDB or the MongoDB Query Language (MQL) required!

Exposing MongoDB Data To Power BI Desktop

One of the benefits of using MongoDB as a platform for BI, is that it eliminates the need for complex ETL operations. BI data workloads can be isolated; separating operational workloads from analytics on separate replica nodes, all operating as a part of the same cluster. The data can also be exposed directly to end users without the need to modify and transfer data before it becomes available to analysts.

Power BI can import MongoDB data through a direct connection to the MongoDB BI Connector or, via ODBC. Once a data connection has been defined, simply select the data you want to work with and import it.

After the import is complete, you can begin working with the data in Power BI Desktop as you would with any data source. If you need to refresh your data, you can easily do so at any time.

Data that resides right in your app is a gold mine when it comes to understanding your customers, but who has time to wait for nightly ETL workflows? With the MongoDB Connector for BI you can take control of your data and get to insights faster. See just how quickly you can discover something!

Through the Looking Glass: Analyzing the interplay between memory, disk, and read performance.

Introduction

Understanding the relationships between various internal caches and disk performance, and how those relationships affect database and application performance, can be challenging. We’ve used the YCSB benchmark, varying the working set (number of documents used for the test) and disk performance, to better show how these relate. While reviewing the results, we’ll cover some MongoDB internals to improve understanding of common database usage patterns.

Key Takeaways

  1. Knowing disk baseline performance is important for understanding overall database performance.
  2. High disk await and utilization are indicative of a disk bottleneck.
  3. WiredTiger IO is random.
  4. A query targeting a single replica set is single threaded and sequential.
  5. Disk performance and working set size are closely related.

Summary

The primary contributors to overall system performance are how the working set relates to both the storage engine cache size (the memory dedicated for storing data) and disk performance (which provides a physical limit to how quickly data can be accessed).

Using YCSB, we explore the interactions between disk performance and cache size, demonstrating how these two factors can affect performance. While YCSB was used for this testing, synthetic benchmarks are not representative of production workloads. Latency and throughput numbers obtained with these methods do not map to production performance. We utilized MongoDB 3.4.10, YCSB 0.14, and the MongoDB 3.6.0 driver for these tests. YCSB was configured with 16 threads, and the “uniform” read only workload.

We show that fitting your working set inside memory provides for optimal application performance and as with any database, exceeding this limit negatively affects latency and overall throughput.

Understanding Disk Metrics

There are four important metrics when considering disk performance:

  1. Disk throughput, or number of requests multiplied by the request size. This is usually measured in megabytes per second. Random read and write performance in the 4kb range is the most representative of standard database workloads. Note that many cloud providers limit the disk throughput or bandwidth.
  2. Disk latency. On Linux this is represented by `await`, the time in milliseconds from an application issuing a read or write before the data is written or returned to the application. For SSDs, latencies are typically under 3ms. HDDs are typically above 7ms. High latencies indicate disks have trouble keeping up with the given workload.
  3. Disk IOPS (Input/Output Operations Per Second). `iostat` reports this metric as `tps`. A given cloud provider may guarantee a certain number of IOPS for a given drive. Should you reach this threshold, any further accesses will be queued, resulting in a disk bottleneck. A high end PCIe attached NVMe device could offer 1,500,000 IOPS while a typical hard disk may only support 150 IOPS.
  4. Disk utilization. Reported by `util` in `iostat`. Linux has multiple `queues` per device for servicing IO. Utilization indicates what percentage of these queues is busy at a given time. While this number can be confusing, it is a good indicator of overall disk health.

Testing Disk Performance

While cloud providers may provide an IOPS threshold for a given volume and disk, and disk manufacturers publish expected performance numbers, the actual results on your system may vary. If the observed disk performance is in question, performing an IO test can be very helpful.

We generally test with fio, the Flexible IO Tester. We performed tests on 10GB of data, the ioengine of psync, and with reads ranging between 4kb and 32kb. While the default fio settings are not representative of the WiredTiger workload, we have found this configuration to be a good approximation of WiredTiger disk utilization.

All tests were repeated under three disk scenarios:

Scenario 1

Default disk settings provided by a AWS c5 io1 100GB volume. 5000 IOPS

  • 1144 IOPS / 5025 physical reads per second / 99.85% util
Scenario 2

Limiting the disk to 600 IOPS and introducing 7ms of latency. This should mirror the performance of a typical RAID10 SAN with hard drives

  • 134 IOPS / 150 physical reads per second / 95.72% util
Scenario 3

Further limiting the disk to 150 IOPS with 7ms latency. This should model a commodity spinning hard drive.

  • 34 IOPS / 150 physical reads per second / 98.2% utilization

How is a query serviced from disk?

The WiredTiger Storage Engine performs its own caching. By default, the WiredTiger cache is sized at 50% of system memory minus 1GB to allow adequate space for both other system processes, the filesystem cache, and internal MongoDB operations that consume additional memory such as building indexes, performing in memory sorts, deduplicating results, text scoring, connection handling, and aggregations. To prevent performance degradation from a totally full cache, WiredTiger automatically begins evicting data from the cache when the utilization grows above 80%. For our tests, this means the effective cache size is (7634MB - 1024MB) * .5 * .8, or 2644MB.

All queries are serviced from the WiredTiger cache. This means a query will cause indexes and documents to be read from disk through the filesystem cache into the WiredTiger cache before returning results. If the requested data is already in the cache, this step is skipped.

WiredTiger stores documents with the snappy compression algorithm by default. Any data read from the file system cache is first decompressed before storing in the WiredTiger cache. Indexes utilize prefix compression by default and are compressed both on disk and inside the WiredTiger cache.

The filesystem cache is an Operating System construct to store frequently accessed files in memory to facilitate faster accesses. Linux is very aggressive in caching files and will attempt to consume all free memory with the filesystem cache. If additional memory is needed, the filesystem cache is evicted to allow more memory for applications.

Here is an animated graphic, showing the disk accesses for the YCSB collection resulting from 100 YCSB read operations. Each operation is an individual find for providing the _id for a single document.

The upper left hand corner represents the first byte in the WiredTiger collection file. Disk locations increment to the right hand side and wrap around. Each row represents a 3.5MB segment of the WiredTiger collection file. The accesses are ordered by time and represented by the frame of animation. Accesses are represented in red and green boxes to highlight the current disk access.

3.5 MB vs 4KB

Here we see the data file for our collection read into memory. Because the data is stored in B+ trees, we may need to find the disk location of our document (the smaller accesses) by visiting one or more locations on disk before our document is found and read (the wider accesses).

This demonstrates the typical access patterns of a MongoDB query – documents are unlikely to be close to each other on disk. This also shows it is highly unlikely for documents, even when inserted after each other, to be in consecutive disk locations.

The WiredTiger storage engine is designed to “read completely”: it will issue a read for all of the data it needs at once. This leads to our recommendation to limit the disk read ahead for WiredTiger deployments to zero, as subsequent accesses are unlikely to take advantage of the additional data retrieved through read ahead.

Working Set Fits in Cache

For our first set of tests, we set the record count to 2 million, resulting in a total size for both data and indexes of 2.43 GB or 92% of cache.

Here we see strong scenario 1 performance of 76,113 requests per second. Checking the filesystem cache statistics, we observe a WiredTiger cache hit rate of 100% with no accesses and zero bytes read into the filesystem cache, meaning no additional IO is required throughout this test.

Unsurprisingly, in scenarios 2 and 3, changing the disk performance (adding 7ms of latency and limiting iops to either 600 or 150) affected throughput minimally (69,579.5 and 70,252 Operations per second respectively).

Our 99% response latencies for all three tests are between 0.40 and 0.44 ms.

Working Set Larger than WiredTiger Cache, but Still Fits in Filesystem Cache

Modern operating systems cache frequently accessed files to improve read performance. Because the file is already in memory, accessing cached files does not result in physical reads. The cached statistics displayed by the free Linux command details the size of the filesystem cache.

When we increase our record count from 2 million to 3 million we increase our total size of data and indexes to 3.66GB, 38% greater than can be serviced solely from the WiredTiger cache.

The metrics are clear that we are reading an average of 548 mbps into the WiredTiger cache, but we can observe a 99.9% hit rate when checking the file system cache metrics.

For this test we begin to see a reduction in performance, performing only 66,720 operations per second compared to our baseline, representing an 8% reduction compared to our previous test serviced solely from the WiredTiger cache.

As expected, reduced disk performance for this case does not significantly affect our overall throughput (64,484 and 64,229 operations respectively). In cases where the documents are more compressible, or the CPU is a limiting factor, the penalty reading from the filesystem cache would be more pronounced.

We note a 54% increase in observed p99 latency to .53 - .55ms.

Working Set Slightly Larger Than WiredTiger and FileSystem Cache

We have established the WiredTiger and file system caches work together to provide data to service our queries. However, when we grow our record count from 3 million to 4 million, we can no longer solely utilize these caches to service queries. Our data size grows to 4.8GB or 82% larger then our WiredTiger cache.

Here, we read into the WiredTiger cache at a rate of 257.4 mbps. Our filesystem cache hit rate lowers to 93-96%, meaning 4-7% of our reads result in physical reads from disk.

Varying the available IOPS and disk latency has a huge impact on performance for this test.

The 99th percentile response latencies further increase. Scenario 1: 19ms, scenario 2: 171ms, and scenario 3: 770ms an increase of 43x, 389x, and 1751x from the in cache case.

We see 75% lower performance when MongoDB is provided the full 5000 iops compared to our earlier test, which fit fully in cache. Scenarios 2 and Scenario 3 achieved 5139.5 and 737.95 Operations per second respectively, further demonstrating the IO bottleneck.

Working Set Much Larger Than WiredTiger and FileSystem Cache

Moving up to 5 million records, we grow our data and index size to 6.09GB, larger than our combined WiredTiger and file system caches. We see our throughput dip below our IOPS. In this case we are still servicing 81% of of WiredTiger reads from the file system cache, but the reads overflowing from disk are saturating our IO. We see 71, 8.3, and 1.9 Mbps read into the filesystem cache for this test.

The 99th percentile response latencies further increase. Scenario 1: 22ms, Scenario 2: 199ms, and Senario 3: 810ms, an increase of 52x, 454x, and 1841x from the in cache response latencies. Here, changing the disk IOPS significantly affects our throughput.

Summary

Through this series of tests we demonstrate two major points.

  1. If the working set fits in cache, disk performance does not greatly affect application performance.
  2. When the working set exceeds available memory, disk performance quickly becomes the limiting factor for throughput.

Understanding how MongoDB utilizes both memory and disks is an important part for both sizing a deployment and understanding performance. The inner workings of the WiredTiger storage engine attempts to use hardware to the fullest extent, but memory and disk are two critical pieces of infrastructure contributing to the overall performance characteristics of your workload.

Introducing the Best Database for Modern Applications

The announcements we made today at MongoDB World 2018 represent a significant milestone in the evolution of MongoDB, making it the database of choice for all modern applications. Broadly speaking, there are three reasons for this:

  1. The document data model – presenting you the best way to work with data.
  2. It’s distributed by design – allowing you to intelligently put data where you want it.
  3. A unified experience that gives you the freedom to run anywhere – allowing you to future-proof your work and eliminate vendor lock-in.

There is a ton of new stuff, and so I wanted to give you a summary of what I covered during my keynote, with links to key resources so you can learn more.

Best Way to Work with Data

Today we released MongoDB Server 4.0 for General Availability. The highlight of the release is multi-document ACID transactions, which we previewed back in February with a beta program that attracted thousands of members of the community, putting transactions through their paces and providing invaluable feedback to the engineering team. We’ve implemented transactions so they feel just like the transactions you are familiar with from relational databases. They enforce snapshot isolation to provide a consistent view of data, and all-or-nothing execution to maintain data integrity. And while the document model means multi-document transactions aren’t necessary for most operations, with them it’s even easier for you to address a complete range of use cases with MongoDB.

It’s no secret how much I love MongoDB’s aggregation framework. Building queries stage-by-stage, checking your output as you go is by far a better way to write your most complicated queries than dealing with a monolithic snarl of SQL. To make that workflow even better, we’ve enhanced MongoDB Compass with the aggregation pipeline builder, which provides stage-by-stage, real-time feedback on the documents flowing through your pipelines. It’s easier than ever to deploy sophisticated processing pipelines that transform, aggregate, and analyze your data, all from the simple and intuitive MongoDB Compass GUI. You can then export the pipelines, and any other queries you create in Compass, to the native code of your preferred programming language. Server 4.0 also adds type conversions to the aggregation pipeline. With the new $convert operator you can transform mixed data types into standardized, cleansed formats natively within the database, preparing it for BI and machine learning, while eliminating costly, slow, and fragile ETL processes.

Extending the tools you can use to work with data managed by the server, we announced the public beta of MongoDB Charts, which provides the fastest and easiest way to get insights into your operational data, in real time. With Charts, you can create and share visualisations of your MongoDB data, using a document-native interface, without needing to move it into other systems or leverage third-party tools.

Documents and MongoDB’s query language are the best way to work with data, and to bring that power out of the datacenter and into the hands of app developers, MongoDB Stitch, which is GA as of today, provides two of its four services: QueryAnywhere and Functions. Using the authentication and declarative access control rules of Stitch QueryAnywhere, we can end the horrid practice of implementing shadow query languages in REST on top of application servers that just turn those REST calls into real query languages. With a native SDK, developers can make use of the full power of MongoDB from mobile and JavaScript applications, while Stitch makes sure that the right permissions are observed. Stitch Functions, JavaScript functions that execute with full access to application context, let developers compose their business logic with access to Atlas and calls to external services. With these two services, it’s easy to build complete applications without standing up a single application server.

Intelligently Put Data Where you Want It

As a distributed system, MongoDB enables you to spread data out across a cluster of nodes for resilience, scalability, and workload isolation. Unlike other distributed databases that randomly spray data around a cluster, MongoDB allows you to define controls that place data on specific nodes, for example in a specific region for low latency reads and writes, and for compliance with new privacy regulations.

The new Global Clusters introduced to MongoDB Atlas allow you to deploy a geographically distributed, fully managed database that provides low latency writes and reads to users anywhere, with data placement controls for regulatory compliance. We also announced Atlas Enterprise, offering new security controls including LDAP integration, the encrypted storage engine with bring-your-own key management, and database-level auditing. Organizations can now also use databases managed by MongoDB Atlas to build HIPAA-compliant applications under an executed Business Associate Agreement (BAA) with MongoDB, Inc. With these now announcements, MongoDB Atlas is the most secure cloud database service available anywhere.

Coding in a distributed world also means that the traditional means of responding to events in a database are no longer viable. So Stitch Triggers, also GA today, makes it possible by building on the Change Streams introduced in MongoDB 3.6. When you create a trigger, Stitch manages a change stream on your behalf, providing real-time notifications to Stitch Functions, which can react in all the ways functions can, from updating analytics rollup collections, to sending email or text messages, or kicking off other external services like Kafka or Kinesis.

Freedom to Run Anywhere

Whether you want to consume your database as a service in MongoDB Atlas, or manage it yourself on your own infrastructure, the announcements today make that even easier. We deliver a data platform that runs the same everywhere, that leverages the benefits of multi-cloud strategy with no lock-in, and is available in 50+ regions across the major cloud providers.

If you want to run MongoDB yourself, then we have released our new free MongoDB monitoring cloud service. The service is available to all MongoDB users, without needing to install an agent, navigate a paywall, or complete a registration form. You will be able to see the metrics and topology about your environment from the moment free monitoring is enabled. You can enable free monitoring easily using the MongoDB shell, MongoDB Compass, or by starting the mongod process with the new db.enableFreeMonitoring() command line option, and you can opt out at any time.

We’re seeing more DevOps teams leveraging the power of containerization and technologies like Kubernetes and Red Hat OpenShift to manage containerized clusters. Today we announced beta of the new MongoDB Enterprise Operator for Kubernetes, enabling you to deploy and manage MongoDB clusters from within the Kubernetes API, without having to connect separately to Ops Manager. You can learn more by reading our Red Hat OpenShift and MongoDB blog, and checking out the repository on GitHub.

Announced today, MongoDB Mobile takes MongoDB to a new frontier. Available in beta, MongoDB Mobile extends your ability to put data where you need it, all the way out to the edge of the network on IoT assets and iOS and Android mobile devices. MongoDB Mobile provides a single database, query language, and the intuitive Stitch SDK that runs consistently for data held on mobile clients, through to the backend server.

MongoDB Mobile provides the power and flexibility of MongoDB in a compact form that is power- and performance-aware with a low disk and memory footprint. It supports 64 bit iOS and Android operating systems and is easily embedded into mobile and IoT devices for fast and reliable local storage of JSON documents. With secondary indexing, access to the full MongoDB query language and aggregations, users can query data any way they want. With local reads and writes, MongoDB Mobile lets you build the fastest, most reactive apps. And Stitch Mobile Sync, which is in private beta now, will automatically synchronize data changes between data held locally and your backend database, helping resolve any conflicts – even after the device has been offline. The beta program is open now, and you can sign up for access on the MongoDB Mobile product page.

What’s Next?

So as you can see, that’s a ton of stuff. The announcements today represent our biggest set of releases yet, and we’re incredibly excited to get it into your hands and see what amazing things you do with them. Head over to our MongoDB World 2018 announcements page for more resources on each of these new products and services.

MongoDB Multi-Document ACID Transactions are GA

With the release of 4.0, you now have multi-document ACID transactions in MongoDB.

But how have you been able to be so productive with MongoDB up until now? The database wasn’t built overnight and many applications have been running mission critical use cases demanding the highest levels of data integrity without multi-document transactions for years. How was that possible?

Well, let’s think first about why many people believe they need multi-document transactions. The first principle of relational data modeling is normalizing your data across tables. This means that many common database operations, like account creation, require atomic updates across many rows and columns.

In MongoDB, the data model is fundamentally different. The document model encourages users to store related data together in a single document. MongoDB, has always supported ACID transactions in a single document and, when leveraging the document model appropriately, many applications don’t need ACID guarantees across multiple documents.

However, transactions are not just a check box. Transactions, like every MongoDB feature, aim to make developers lives easier. ACID guarantees across documents simplify application logic needed to satisfy complex applications.

The value of MongoDB’s transactions

While MongoDB’s existing atomic single-document operations already provide transaction semantics that satisfy the majority of applications, the addition of multi-document ACID transactions makes it easier than ever for developers to address the full spectrum of use cases with MongoDB. Through snapshot isolation, transactions provide a consistent view of data and enforce all-or-nothing execution to maintain data integrity. For developers with a history of transactions in relational databases, MongoDB’s multi-document transactions are very familiar, making it straightforward to add them to any application that requires them.

In MongoDB 4.0, transactions work across a replica set, and MongoDB 4.2 will extend support to transactions across a sharded deployment* (see the Safe Harbor statement at the end of this blog). Our path to transactions represents a multi-year engineering effort, beginning back in early 2015 with the groundwork laid in almost every part of the server and the drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

In this blog, we will explore why MongoDB had added multi-document ACID transactions, their design goals and implementation, characteristics of transactions and best practices for developers. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

Why Multi-Document ACID Transactions

Since its first release in 2009, MongoDB has continuously innovated around a new approach to database design, freeing developers from the constraints of legacy relational databases. A design founded on rich, natural, and flexible documents accessed by idiomatic programming language APIs, enabling developers to build apps 3-5x faster. And a distributed systems architecture to handle more data, place it where users need it, and maintain always-on availability. This approach has enabled developers to create powerful and sophisticated applications in all industries, across a tremendously wide range of use cases.

Figure 1: Organizations innovating with MongoDB

With subdocuments and arrays, documents allow related data to be modeled in a single, rich and natural data structure, rather than spread across separate, related tables composed of flat rows and columns. As a result, MongoDB’s existing single document atomicity guarantees can meet the data integrity needs of most applications. In fact, when leveraging the richness and power of the document model, we estimate 80%-90% of applications don’t need multi-document transactions at all.

However, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-document transactions are a requirement for any database, irrespective of the data model they are built upon. Some are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future. And for some workloads, support for ACID transactions across multiple documents is required.

As a result, the addition of multi-document transactions makes it easier than ever for developers to address a complete range of use cases on MongoDB. For some, simply knowing that they are available will assure them that they can evolve their application as needed, and the database will support them.

Data Models and Transactions

Before looking at multi-document transactions in MongoDB, we want to explore why the data model used by a database impacts the scope of a transaction.

Relational Data Model

Relational databases model an entity’s data across multiple records and parent-child tables, and so a transaction needs to be scoped to span those records and tables. The example in Figure 2 shows a contact in our customer database, modeled in a relational schema. Data is normalized across multiple tables: customer, address, city, country, phone number, topics and associated interests.

Figure 2: Customer data modeled across separate tables in a relational database

In the event of the customer data changing in any way, for example if our contact moves to a new job, then multiple tables will need to be updated in an “all-or-nothing” transaction that has to touch multiple tables, as illustrated in Figure 3.

Figure 3: Updating customer data in a relational database

Document Data Model

Document databases are different. Rather than break related data apart and spread it across multiple parent-child tables, documents can store related data together in a rich, typed, hierarchical structure, including subdocuments and arrays, as illustrated in Figure 4.

Figure 4: Customer data modeled in a single, rich document structure

MongoDB provides existing transactional properties scoped to the level of a document. As shown in Figure 5, one or more fields may be written in a single operation, with updates to multiple subdocuments and elements of any array, including nested arrays. The guarantees provided by MongoDB ensure complete isolation as a document is updated; any errors cause the operation to roll back so that clients receive a consistent view of the document. With this design, application owners get the same data integrity guarantees as those provided by older relational databases.

Figure 5: Updating customer data in a document database

Where are Multi-Document Transactions Useful

There are use cases where transactional ACID guarantees need to be applied to a set of operations that span multiple documents. Back office “System of Record” or “Line of Business” (LoB) applications are the typical class of workload where multi-document transactions are useful. Examples include:

  • Processing application events when users perform important actions -- for instance when updating the status of an account, say to delinquent, across all those users’ documents.
  • Logging custom application actions -- say when a user transfers ownership of an entity, the write should not be successful if the logging isn’t.
  • Many to many relationships where the data naturally fits into defined objects -- for example positions, calculated by an aggregate of hundreds of thousands of trades, need to be updated every time trades are added or modified.

MongoDB already serves these use cases today, but the introduction of multi-document transactions makes it easier as the database automatically handles multi-document transactions for you. Before their availability, the developer would need to programmatically implement transaction controls in their application. To ensure data integrity, they would need to ensure that all stages of the operation succeed before committing updates to the database, and if not, roll back any changes. This adds complexity that slows down the rate of application development. MongoDB customers in the financial services industry have reported they were able to cut 1,000+ lines of code from their apps by using multi-document transactions.

In addition, implementing client side transactions can impose performance overhead on the application. For example, after moving from its existing client-side transactional logic to multi-document transactions, a global enterprise data management and integration ISV experienced improved MongoDB performance in its Master Data Management solution: throughput increased by 90%, and latency was reduced by over 60% for transactions that performed six updates across two collections.

Multi-Document ACID Transactions in MongoDB

Transactions in MongoDB feel just like transactions developers are used to in relational databases. They are multi-statement, with similar syntax, making them familiar to anyone with prior transaction experience.

The following Python code snippet shows a sample of the transactions API.

    with client.start_session() as s:
            s.start_transaction()
            collection_one.insert_one(doc_one, session=s)
            collection_two.insert_one(doc_two, session=s)
            s.commit_transaction()

The following snippet shows the transactions API for Java.

    try (ClientSession clientSession = client.startSession()) {
                  clientSession.startTransaction();
                  collection.insertOne(clientSession, docOne);
                  collection.insertOne(clientSession, docTwo);
                  clientSession.commitTransaction();
        }

As these examples show, transactions use regular MongoDB query language syntax, and are implemented consistently whether the transaction is executed across documents and collections in a replica set, and with MongoDB 4.2, across a sharded cluster*.

We're excited to see MongoDB offer dedicated support for ACID transactions in their data platform and that our collaboration is manifest in the Lovelace release of Spring Data MongoDB. It ships with the well known Spring annotation-driven, synchronous transaction support using the MongoTransactionManager but also bits for reactive transactions built on top of MongoDB's ReactiveStreams driver and Project Reactor datatypes exposed via the ReactiveMongoTemplate.

Pieter Humphrey - Spring Product Marketing Lead, Pivotal

The transaction block code snippets below compare the MongoDB syntax with that used by MySQL. It shows how multi-document transactions feel familiar to anyone who has used traditional relational databases in the past.

MySQL

    db.start_transaction()
        cursor.execute(orderInsert, orderData)
        cursor.execute(stockUpdate, stockData)
    db.commit()

MongoDB

    s.start_transaction()
        orders.insert_one(order, session=s)
        stock.update_one(item, stockUpdate, session=s)
    s.commit_transaction()

Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity. Transactions can apply to operations against multiple documents contained in one, or in many, collections and databases. The changes to MongoDB that enable multi-document transactions do not impact performance for workloads that don't require them.

During its execution, a transaction is able to read its own uncommitted writes, but none of its uncommitted writes will be seen by other operations outside of the transaction. Uncommitted writes are not replicated to secondary nodes until the transaction is committed to the database. Once the transaction has been committed, it is replicated and applied atomically to all secondary replicas.

An application specifies write concern in the transaction options to state how many nodes should commit the changes before the server acknowledges the success to the client. All uncommitted writes live on the primary exclusively.

Taking advantage of the transactions infrastructure introduced in MongoDB 4.0, the new snapshot read concern ensures queries and aggregations executed within a read-only transaction will operate against a single, isolated snapshot on the primary replica. As a result, a consistent view of the data is returned to the client, irrespective of whether that data is being simultaneously modified by concurrent operations. Snapshot reads are especially useful for operations that return data in batches with the getMore command.

Even before MongoDB 4.0, typical MongoDB queries leveraged WiredTiger snapshots. The distinction between typical MongoDB queries and snapshot reads in transactions is that snapshot reads use the same snapshot throughout the duration of the query. Whereas typical MongoDB queries may switch to a more current snapshot during yield points.

Snapshot Isolation and Write Conflicts

When modifying a document, a transaction locks the document from additional changes until the transaction completes. If a transaction is unable to obtain a lock on a document it is attempting to modify, likely because another transaction is already holding the lock, it will immediately abort after 5ms with a write conflict.

However, if a typical MongoDB write outside of a multi-document transaction tries to modify a document that is currently being held by a multi-document transaction, that write will block behind the transaction completing. The typical MongoDB write will be infinitely retried with backoff logic until $maxTimeMS is reached. Even prior to 4.0, all wirtes were represented as WiredTiger transactions at the storage layer and it was possible to get write conflicts. This same logic was implemented to avoid users having to react to write conflicts client-side.

Reads do not require the same locks that document modifications do. Documents that have uncommitted writes by a transaction can still be read by other operations, and of course those operations will only see the committed values, and not the uncommitted state.

Only writes trigger write conflicts within MongoDB. Reads, in line with the expected behavior of snapshot isolation, do not prevent a document from being modified by other operations. Changes will not be surfaced as write conflicts unless a write is performed on the document. Additionally, no-op updates – like setting a field to the same value that it already had – can be optimized away before reaching the storage engine, thus not triggering a write conflict. To guarantee that a write conflict is detected, perform an operation like incrementing a counter.

Retrying Transactions

MongoDB 4.0 introduces the concept of error labels. The transient transaction error label notifies listening applications that the error surfaced, ranging from network errors to write conflicts, that the error may be temporary, and that the transaction is safe to retry from the beginning. Permanent errors, like parsing errors, will not have the transient transaction error label, as rerunning the transaction will never result in a successful commit.

One of the core values of MongoDB is its highly available architecture. These error labels make it easy for applications to be resilient in cases of network blips or node failures, enabling cross document transactional guarantees without sacrificing use cases that must be always-on.

Transactions Best Practices

As noted earlier, MongoDB’s single document atomicity guarantees will meet 80-90% of an application’s transactional needs. They remain the recommended way of enforcing your app’s data integrity requirements. For those operations that do require multi-document transactions, there are several best practices that developers should observe.

Creating long running transactions, or attempting to perform an excessive number of operations in a single ACID transaction can result in high pressure on WiredTiger’s cache. This is because the cache must maintain state for all subsequent writes since the oldest snapshot was created. As a transaction uses the same snapshot while it is running, new writes accumulate in the cache throughout the duration of the transaction. These writes cannot be flushed until transactions currently running on old snapshots commit or abort, at which time the transactions release their locks and WiredTiger can evict the snapshot. To maintain predictable levels of database performance, developers should therefore consider the following:

  1. By default, MongoDB will automatically abort any multi-document transaction that runs for more than 60 seconds. Note that if write volumes to the server are low, you have the flexibility to tune your transactions for a longer execution time. To address timeouts, the transaction should be broken into smaller parts that allow execution within the configured time limit. You should also ensure your query patterns are properly optimized with the appropriate index coverage to allow fast data access within the transaction.
  2. There are no hard limits to the number of documents that can be read within a transaction. As a best practice, no more than 1,000 documents should be modified within a transaction. For operations that need to modify more than 1,000 documents, developers should break the transaction into separate parts that process documents in batches.
  3. In MongoDB 4.0, a transaction is represented in a single oplog entry, therefore must be within the 16MB document size limit. While an update operation only stores the deltas of the update (i.e., what has changed), an insert will store the entire document. As a result, the combination of oplog descriptions for all statements in the transaction must be less than 16MB. If this limit is exceeded, the transaction will be aborted and fully rolled back. The transaction should therefore be decomposed into a smaller set of operations that can be represented in 16MB or less.
  4. When a transaction aborts, an exception is returned to the driver and the transaction is fully rolled back. Developers should add application logic that can catch and retry a transaction that aborts due to temporary exceptions, such as a transient network failure or a primary replica election. With retryable writes, the MongoDB drivers will automatically retry the commit statement of the transaction.
  5. DDL operations, like creating an index or dropping a database, block behind active running transactions on the namespace. All transactions that try to newly access the namespace while DDL operations are pending, will not be able to obtain locks, aborting the new transactions.
  6. You can review all best practices in the MongoDB documentation for multi-document transactions.

    Transactions and Their Impact to Data Modeling in MongoDB

    Adding transactions does not make MongoDB a relational database – many developers have already experienced that the document model is superior to the relational one today.

    All best practices relating to MongoDB data modeling continue to apply when using features such as multi-document transactions, or fully expressive JOINs (via the $lookup aggregation pipeline stage). Where practical, all data relating to an entity should be stored in a single, rich document structure. Just moving data structured for relational tables into MongoDB will not allow users to take advantage of MongoDB’s natural, fast, and flexible document model, or its distributed systems architecture.

    The RDBMS to MongoDB Migration Guidedescribes the best practices for moving an application from a relational database to MongoDB.

    The Path to Transactions

    Our path to transactions represents a multi-year engineering effort, beginning over 3 years ago with the integration of the WiredTiger storage engine. We’ve laid the groundwork in practically every part of the platform – from the storage layer itself to the replication consensus protocol, to the sharding architecture. We’ve built out fine-grained consistency and durability guarantees, introduced a global logical clock, refactored cluster metadata management, and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers. We are feature complete in bringing multi-document transactions to a replica set, and 90% done on implementing the remaining features needed to deliver transactions across a sharded cluster.

    Figure 6 presents a timeline of the key engineering projects that have enabled multi-document transactions in MongoDB, with status shown as of June 2018. The key design goal underlying all of these projects is that their implementation does not sacrifice the key benefits of MongoDB – the power of the document model and the advantages of distributed systems, while imposing no performance impact to workloads that don’t require multi-document transactions.

    Figure 6: The path to transactions – multi-year engineering investment, delivered across multiple releases

    Conclusion

    MongoDB has already established itself as the leading database for modern applications. The document data model is rich, natural, and flexible, with documents accessed by idiomatic drivers, enabling developers to build apps 3-5x faster. Its distributed systems architecture enables you to handle more data, place it where users need it, and maintain always-on availability. MongoDB’s existing atomic single-document operations provide transaction semantics that meet the data integrity needs of the majority of applications. The addition of multi-document ACID transactions in MongoDB 4.0 makes it easier than ever for developers to address a complete range of us cases, while for many, simply knowing that they are available will provide critical peace of mind.

    Take a look at our multi-document transactions web page where you can hear directly from the MongoDB engineers who have built transactions, review code snippets, and access key resources to get started. You can get started with MongoDB 4.0 now by spinning up your own fully managed, on-demand MongoDB Atlas cluster, or downloading it to run on your own infrastructure.

    If you want to learn more about everything that’s new in MongoDB 4.0, download our Guide.

    Transactional guarantees have been a critical feature for relational databases for decades, but have typically been absent from non-relational alternatives, which has meant that users have been forced to choose between transactions and the flexibility and versatility that non-relational databases offer. With its support for multi-document ACID transactions, MongoDB is built for customers that want to have their cake and eat it too.

    Stephen O’Grady, Principal Analyst, Redmonk

    *Safe Harbour Statement

    This blog contains “forward-looking statements” within the meaning of Section 27A of the Securities Act of 1933, as amended, and Section 21E of the Securities Exchange Act of 1934, as amended. Such forward-looking statements are subject to a number of risks, uncertainties, assumptions and other factors that could cause actual results and the timing of certain events to differ materially from future results expressed or implied by the forward-looking statements. Factors that could cause or contribute to such differences include, but are not limited to, those identified our filings with the Securities and Exchange Commission. You should not rely upon forward-looking statements as predictions of future events. Furthermore, such forward-looking statements speak only as of the date of this presentation.

    In particular, the development, release, and timing of any features or functionality described for MongoDB products remains at MongoDB’s sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality. Except as required by law, we undertake no obligation to update any forward-looking statements to reflect events or circumstances after the date of such statements.

MongoDB Charts Beta Now Available

Today at MongoDB World 2018 in New York, we are excited to announce the beta release of MongoDB Charts, available now.

MongoDB Charts is the fastest and easiest way to build visualizations of MongoDB data. Now you can easily build visualizations with an intuitive UI and analyze complex, nested data—like arrays and subdocuments—something other visualization technologies designed for tabular databases struggle with. Then assemble those visualizations into a dashboard for at-a-glance, up-to-the minute information. Dashboards can be shared with other users, either for collaboration or viewing only, so that entire groups and organizations can become data-driven and benefit from data visualizations and the insights they provide. When you connect to a live data source, MongoDB Charts will keep your charts and dashboards up to date with the most recent data.

Charts allows you to connect to any MongoDB instance on which you have access permissions, or use existing data sources that other Charts users share with you. With MongoDB’s workload isolation capabilities—enabling you to separate your operational from analytical workloads in the same cluster—you can use Charts for a real-time view without having any impact on production workloads.

Unlike other visualization products, MongoDB Charts is designed to natively handle MongoDB’s rich data structures. MongoDB Charts makes it easy to visualize complex arrays and subdocuments without flattening the data or spending time and effort on ETL. Charts will automatically generate an aggregation pipeline from your chart design which is executed on your MongoDB server, giving you full access to the power of MongoDB when creating visualizations.

Charts already supports all common visualization types, including bar, column, line, area, scatter and donut charts, with table views and other advanced charts coming soon. Multiple visualizations can be assembled into dashboards, providing an at-a-glance understanding of all of your most important data. Dashboards can be used to track KPIs, understand trends, spot outliers and anomalies, and more.

Graphs and charts are a great way to communicate insights with others, but we know data can often be sensitive. Charts lets you stay in control over which users in your organization have access to your data sources and dashboards. You can choose to share with your entire organization, select individuals or keep things private to yourself. Dashboards can be shared as read-only or as read-write, depending on whether you want to communicate or collaborate.

Since this is a beta release, it’s free for anyone to try, but keep in mind that we may add or change functionality before the final release. MongoDB Charts is available as a Docker image which you can install on a server or VM within your environment. Once the Docker container is running, people in your organisation can access Charts from any desktop web browser. To download the image and get started with Charts, head to the MongoDB Download Center.

We think that MongoDB Charts is the best way to get quick, self-service visualizations of the data you’re storing in MongoDB. But we’re only just getting started. Once you try out the beta, please use the built-in feedback tool to report issues and to let us know what you do and don’t like, and we’ll use this input to make future releases even better.

Happy charting!


The development, release, and timing of any features or functionality described for our products remains at our sole discretion. This information is merely intended to outline our general product direction and it should not be relied on in making a purchasing decision nor is this a commitment, promise or legal obligation to deliver any material, code, or functionality.

Introducing the MongoDB Enterprise Operator for Kubernetes and OpenShift

Today more DevOps teams are leveraging the power of containerization, and technologies like Kubernetes and Red Hat OpenShift, to manage containerized database clusters. To support teams building cloud-native apps with Kubernetes and OpenShift, we are introducing a Kubernetes Operator (beta) that integrates with Ops Manager, the enterprise management platform for MongoDB. The operator enables a user to deploy and manage MongoDB clusters from the Kubernetes API, without having to manually configure them in Ops Manager.

With this Kubernetes integration, you can consistently and effortlessly run and deploy workloads wherever they need to be, standing up the same database configuration in different environments, all controlled with a simple, declarative configuration. Operations teams can also offer developers new services like MongoDB-as-a-Service, that could provide for them a fully managed database, alongside other products and services, managed by Kubernetes and OpenShift.

In this blog, we’ll cover the following:

  • Brief discussion on the container revolution
  • Overview of MongoDB Ops Manager
  • How to Install and configure the MongoDB Enterprise Operator for Kubernetes
  • Troubleshooting
  • Where to go for more information

The containerization movement

If you ever visited an international shipping port or drove down an interstate highway you may have seen large rectangular metal containers generally referred to as intermodal containers. These containers are designed and built using the same specifications even though the contents of these boxes can vary greatly. The consistent design not only enables these containers to freely move from ship, to rail, and to truck, they also allow this movement without unloading and reloading the cargo contents.

This same concept of a container can be applied to software applications where the application is the contents of the container along with its supporting frameworks and libraries. The container can be freely moved from one platform to another all without disturbing the application. This capability makes it easy to move an application from an on-premise datacenter server to a public cloud provider, or to quickly stand up replica environments for development, test, and production usage.

MongoDB 4.0 introduces the MongoDB Enterprise Operator for Kubernetes which enables a user to deploy and manage MongoDB clusters from the Kubernetes API, without the user having to connect directly to Ops Manager or Cloud Manager (the hosted version of Ops Manager, delivered as a service.

While MongoDB is fully supported in a containerized environment, you need to make sure that the benefits you get from containerizing the database exceed the cost of managing the configuration. As with any production database workload, these containers should use persistent storage and will require additional configuration depending on the underlying container technology used. To help facilitate the management of the containers themselves, DevOps teams are leveraging the power of orchestration technologies like Kubernetes and Red Hat OpenShift. While these technologies are great at container management, they are not aware of application specific configurations and deployment topologies such as MongoDB replica sets and sharded clusters. For this reason, Kubernetes has Custom Resources and Operators which allow third-parties to extend the Kubernetes API and enable application aware deployments.

Later in this blog you will learn how to install and get started with the MongoDB Enterprise Operator for Kubernetes. First let’s cover MongoDB Ops Manager, which is a key piece in efficient MongoDB cluster management.

Managing MongoDB

Ops Manager is an enterprise class management platform for MongoDB clusters that you run on your own infrastructure. The capabilities of Ops Manager include monitoring, alerting, disaster recovery, scaling, deploying and upgrading of replica sets and sharded clusters, and other MongoDB products, such as the BI Connector. While a thorough discussion of Ops Manager is out of scope of this blog it is important to understand the basic components that make up Ops Manager as they will be used by the Kubernetes Operator to create your deployments..

Figure 1: MongoDB Ops Manager deployment screen

A simplified Ops Manager architecture is shown in Figure 2 below. Note that there are other agents that Ops Manager uses to support features like backup but these are outside the scope of this blog and not shown. For complete information on MongoDB Ops Manager architecture see the online documentation found at the following URL: https://docs.opsmanager.mongodb.com/current/

Figure 2: Simplified Ops Manager deployment

The MongoDB HTTP Service provides a web application for administration. These pages are simply a front end to a robust set of Ops Manager REST APIs that are hosted in the Ops Manager HTTP Service. It is through these REST APIs that the Kubernetes Operator will interact with Ops Manager.

MongoDB Automation Agent

With a typical Ops Manager deployment there are many management options including upgrading the cluster to a different version, adding secondaries to an existing replica set and converting an existing replica set into a sharded cluster. So how does Ops Manager go about upgrading each node of a cluster or spinning up new MongoD instances? It does this by relying on a locally installed service called the Ops Manager Automation Agent which runs on every single MongoDB node in the cluster. This lightweight service is available on multiple operating systems so regardless if your MongoDB nodes are running in a Linux Container or Windows Server virtual machine or your on-prem PowerPC Server, there is an Automation Agent available for that platform. The Automation Agents receive instructions from Ops Manager REST APIs to perform work on the cluster node.

MongoDB Monitoring Agent

When Ops Manager shows statistics such as database size and inserts per second it is receiving this telemetry from the individual nodes running MongoDB. Ops Manager relies on the Monitoring Agent to connect to your MongoDB processes, collect data about the state of your deployment, then send that data to Ops Manager. There can be one or more Monitoring Agents deployed in your infrastructure for reliability but only one primary agent per Ops Manager Project is collecting data. Ops Manager is all about automation and as soon as you have the automation agent deployed, other supporting agents like the Monitoring agent are deployed for you. In the scenario where the Kubernetes Operator has issued a command to deploy a new MongoDB cluster in a new project, Ops Manager will take care of deploying the monitoring agent into the containers running your new MongoDB cluster.

Getting started with MongoDB Enterprise Operator for Kubernetes

Ops Manager is an integral part of automating a MongoDB cluster with Kubernetes. To get started you will need access to an Ops Manager 4.0+ environment or MongoDB Cloud Manager.

The MongoDB Enterprise Operator for Kubernetes is compatible with Kubernetes v1.9 and above. It also has been tested with Openshift version 3.9. You will need access to a Kubernetes environment. If you do not have access to a Kubernetes environment, or just want to stand up a test environment, you can use minikube which deploys a local single node Kubernetes cluster on your machine. For additional information and setup instructions check out the following URL: https://kubernetes.io/docs/setup/minikube.

The following sections will cover the three step installation and configuration of the MongoDB Enterprise Operator for Kubernetes. The order of installation will be as follows:

  • Step 1: Installing the MongoDB Enterprise Operator via a helm or yaml file
  • Step 2: Creating and applying a Kubernetes ConfigMap file
  • Step 3: Create the Kubernetes secret object which will store the Ops Manager API Key

Step 1: Installing MongoDB Enterprise Operator for Kubernetes

To install the MongoDB Enterprise Operator for Kubernetes you can use helm, the Kubernetes package manager, or pass a yaml file to kubectl. The instructions for both of these methods is as follows, pick one and continue to step 2.

To install the operator via Helm:

To install with Helm you will first need to clone the public repo https://github.com/mongodb/mongodb-enterprise-kubernetes.git

Change directories into the local copy and run the following command on the command line:

helm install helm_chart/ --name mongodb-enterprise

To install the operator via a yaml file:

Run the following command from the command line:

kubectl apply -f https://raw.githubusercontent.com/mongodb/mongodb-enterprise-kubernetes/master/mongodb-enterprise.yaml

At this point the MongoDB Enterprise Operator for Kubernetes is installed and now needs to be configured. First, we must create and apply a Kubernetes ConfigMap file. A Kubernetes ConfigMap file holds key-value pairs of configuration data that can be consumed in pods or used to store configuration data. In this use case the ConfigMap file will store configuration information about the Ops Manager deployment we want to use.

Step 2: Creating the Kubernetes ConfigMap file

For the Kubernetes Operator to know what Ops Manager you want to use you will need to obtain some properties from the Ops Manager console and create a ConfigMap file. These properties are as follows:

Base Url - The URL of your Ops Manager or Cloud Manager.

Project Id - The id of an Ops Manager Project which the Kubernetes Operator will deploy into.

User - An existing Ops Manager username

Public API Key - Used by the Kubernetes Operator to connect to the Ops Manager REST API endpoint

If you already know how to obtain these follows copy them down and proceed to Step 3.

Base Url

The Base Uri is the URL of your Ops Manager or Cloud Manager.

Note: If you are using Cloud Manager the Base Url is, “https://cloud.mongodb.com

To obtain the Base Url in Ops Manager copy the Url used to connect to your Ops Manager server from your browser's navigation bar. It should be something similar to http://servername:8080. You can also perform the following:

Login to Ops Manager and click on the Admin button. Next select the “Ops Manager Config” menu item. You will be presented with a screen similar to the figure below:

Figure 3: Ops Manager Config page

Copy down the value displayed in the URL To Access Ops Manager box. Note: If you don’t have access to the Admin drop down you will have to copy the Url used to connect to your Ops Manager server from your browser's navigation bar.

Project Id

The Project Id is the id of an Ops Manager Project which the Kubernetes Operator will deploy into.

An Ops Manager Project is a logical organization of MongoDB clusters and also provides a security boundary. One or more Projects are apart of an Ops Manager Organization. If you need to create an Organization click on your user name at the upper right side of the screen and select, “Organizations”. Next click on the “+ New Organization” button and provide a name for your Organization. Once you have an Organization you can create a Project.

Figure 4: Ops Manager Organizations page

To create a new Project, click on your Organization name. This will bring you to the Projects page and from here click on the “+ New Project” button and provide a unique name for your Project. If you are not an Ops Manager administrator you may not have this option and will have to ask your administrator to create a Project.

Once the Project is created or if you already have a Project created on your behalf by an administrator you can obtain the Project Id by clicking on the Settings menu option as shown in the Figure below.

Figure 5: Project Settings page

Copy the Project ID.

User

The User is an existing Ops Manager username

To see the list of Ops Manager users return to the Project and click on the “Users & Teams” menu. You can use any Ops Manager user who has at least Project Owner access. If you’d like to create another username click on the “Add Users & Team” button as shown in Figure 6.

Figure 6: Users & Teams page

Copy down the email of the user you would like the Kubernetes Operator to use when connecting to Ops Manager.

Public API Key

The Ops Manager API Key is used by the Kubernetes Operator to connect to the Ops Manager REST API endpoint. You can create a API Key by clicking on your username on the upper right hand corner of the Ops Manager console and selecting, “Account” from the drop down menu. This will open the Account Settings page as shown in Figure 7.

Figure 7: Public API Access page

Click on the “Public API Access” tab. To create a new API key click on the “Generate” button and provide a description. Upon completion you will receive an API key as shown in Figure 8.

Figure 8: Confirm API Key dialog

Be sure to copy the API Key as it will be used later as a value in a configuration file. It is important to copy this value while the dialog is up since you can not read it back once you close the dialog. If you missed writing the value down you will need to delete the API Key and create a new one.

Note: If you are using MongoDB Cloud Manager or have Ops Manager deployed in a secured network you may need to whitelist the IP range of your Kubernetes cluster so that the Operator can make requests to Ops Manager using this API Key.

Now that we have acquired the necessary Ops Manager configuration information we need to create a Kubernetes ConfigMap file for the Kubernetes Project. To do this use a text editor of your choice and create the following yaml file, substituting the bold placeholders for the values you obtained in the Ops Manager console. For sample purposes we can call this file “my-project.yaml”.

apiVersion: v1
kind: ConfigMap
metadata:
  name:<<Name i.e. name of OpsManager Project>>
  namespace: mongodb
data:
  projectId:<<Project ID>>
  baseUrl: <<OpsManager URL>>

Figure 9: Sample ConfigMap file

Note: The format of the ConfigMap file may change over time as features and capabilities get added to the Operator. Be sure to check with the MongoDB documentation if you are having problems submitting the ConfigMap file.

Once you create this file you can apply the ConfigMap to Kubernetes using the following command:

kubectl apply -f my-project.yaml

Step 3: Creating the Kubernetes Secret

For a user to be able to create or update objects in an Ops Manager Project they need a Public API Key. Earlier in this section we created a new API Key and you hopefully wrote it down. This API Key will be held by Kubernetes as a Secret object.

You can create this Secret with the following command:

kubectl -n mongodb create secret generic <<Name of credentials>> --from-literal="user=<<User>>" --from-literal="publicApiKey=<<public-api-key>>"

Make sure you replace the User and Public API key values with those you obtained from your Ops Manager console. You can pick any name for the credentials – just make a note of it as you will need it later when you start creating MongoDB clusters.

Now we're ready to start deploying MongoDB Clusters!

Deploying a MongoDB Replica Set

Kubernetes can deploy a MongoDB standalone, replica set or a sharded cluster. To deploy a 3 node replica set create the following yaml file:

apiVersion: mongodb.com/v1
kind: MongoDbReplicaSet
metadata:
  name: <<Name of your new MongoDB replica set>>
  namespace: mongodb
spec:
  members: 3
  version: 3.6.5

  persistent: false

  project: <<Name value specified in metadata.name of ConfigMap file>>
  credentials: <<Name of credentials secret>>
Figure 10: simple-rs.yaml file describing a three node replica set

The name of your new cluster can be any name you chose. The name of the OpsManager Project config map and the name of credentials secret were defined previously.

To submit the request for Kubernetes to create this cluster simply pass the name of the yaml file you created to the following kubectl command:

kubectl apply -f simple-rs.yaml

After a few minutes your new cluster will show up in Ops Manager as shown in Figure 11.

Figure 11: Servers tab of the Deployment page in Ops Manager

Notice that Ops Manager installed not only the Automation Agents on these three containers running MongoDB, it also installed Monitoring Agent and Backup Agents.

A word on persistent storage

What good would a database be if anytime the container died your data went to the grave as well? Probably not a good situation and maybe one where tuning up the resumé might be a good thing to do as well. Up until recently, the lack of persistent storage and consistent DNS mappings were major issues with running databases within containers. Fortunately, recent work in the Kubernetes ecosystem has addressed this concern and new features like PersistentVolumes and StatefulSets have emerged allowing you to deploy databases like MongoDB without worrying about losing data because of hardware failure or the container moved elsewhere in your datacenter. Additional configuration of the storage is required on the Kubernetes cluster before you can deploy a MongoDB Cluster that uses persistent storage. In Kubernetes there are two types of persistent volumes: static and dynamic. The Kubernetes Operator can provision MongoDB objects (i.e. standalone, replica set and sharded clusters) using either type.

Connecting your application

Connecting to MongoDB deployments in Kubernetes is no different than other deployment topologies. However, it is likely that you'll need to address the network specifics of your Kubernetes configuration. To abstract the deployment specific information such as hostnames and ports of your MongoDB deployment, the Kubernetes Enterprise Operator for Kubernetes uses Kubernetes Services.

Services

Each MongoDB deployment type will have two Kubernetes services generated automatically during provisioning. For example, suppose we have a single 3 node replica set called "my-replica-set", then you can enumerate the services using the following statement:

kubectl get all -n mongodb --selector=app=my-replica-set-svc

This statement yields the following results:

NAME                   READY     STATUS    RESTARTS   AGE
pod/my-replica-set-0   1/1       Running   0          29m
pod/my-replica-set-1   1/1       Running   0          29m
pod/my-replica-set-2   1/1       Running   0          29m

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
service/my-replica-set-svc            ClusterIP   None             <none>        27017/TCP         29m
service/my-replica-set-svc-external   NodePort    10.103.220.236   <none>        27017:30057/TCP   29m

NAME                              DESIRED   CURRENT   AGE
statefulset.apps/my-replica-set   3         3         29m

Note the appended string "-svc" to the name of the replica set.

The service with "-external" is a NodePort - which means it's exposed to the overall cluster DNS name on port 30057.

Note: If you are using Minikube you can obtain the IP address of the running replica set by issuing the following:

minikube service list

In our example which used minikube the result set contained the following information: mongodb my-replica-set-svc-external http://192.168.39.95:30057

Now that we know the IP of our MongoDB cluster we can connect using the Mongo Shell or whatever application or tool you would like to use.

Basic Troubleshooting

If you are having problems submitting a deployment you should read the logs. Issues like authentication issues and other common problems can be easily detected in the log files. You can view the MongoDB Enterprise Operator for Kubernetes log files via the following command:

kubectl logs -f deployment/mongodb-enterprise-operator -n mongodb

You can also use kubectl to see the logs of the database pods. The main container processes is continually tailing the Automation Agent logs and can be seen with the following statement:

kubectl logs <<name of pod>> -n mongodb

Note: You can enumerate the list of pods using

kubectl get pods -n mongodb

Another common troubleshooting technique is to shell into one of the containers running MongoDB. Here you can use common Linux tools to view the processes, troubleshoot, or even check mongo shell connections (sometimes helpful in diagnosing network issues).

kubectl exec -it <<name of pod>> -n mongodb -- /bin/bash

An example output of this command is as follows:

UID        PID  PPID  C STIME TTY          TIME CMD
mongodb      1     0  0 16:23 ?        00:00:00 /bin/sh -c supervisord -c /mongo
mongodb      6     1  0 16:23 ?        00:00:01 /usr/bin/python /usr/bin/supervi
mongodb      9     6  0 16:23 ?        00:00:00 bash /mongodb-automation/files/a
mongodb     25     9  0 16:23 ?        00:00:00 tail -n 1000 -F /var/log/mongodb
mongodb     26     1  4 16:23 ?        00:04:17 /mongodb-automation/files/mongod
mongodb     45     1  0 16:23 ?        00:00:01 /var/lib/mongodb-mms-automation/
mongodb     56     1  0 16:23 ?        00:00:44 /var/lib/mongodb-mms-automation/
mongodb     76     1  1 16:23 ?        00:01:23 /var/lib/mongodb-mms-automation/
mongodb   8435     0  0 18:07 pts/0    00:00:00 /bin/bash

From inside the container we can make a connection to the local MongoDB node easily by running the mongo shell via the following command:

/var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.6.5/bin/mongo --port 27017

Note: The version of the automation agent may be different than 3.6.5, be sure to check the directory path

Where to go for more information

More information will be available on the MongoDB documentation website in the near future. Until then check out these resources for more information:

Slack: #enterprise-kubernetes

Sign up @ https://launchpass.com/mongo-db

GitHub: https://github.com/mongodb/mongodb-enterprise-kubernetes

To see all MongoDB operations best practices, download our whitepaper: https://www.mongodb.com/collateral/mongodb-operations-best-practices

Scaling Your Replica Set: Non-Blocking Secondary Reads in MongoDB 4.0

MongoDB 4.0 adds the ability to read from secondaries while replication is simultaneously processing writes. To see why this is new and important let's look at secondary read behavior in versions prior to 4.0.

Background

From the outset MongoDB has been designed so that when you have sequences of writes on the primary, each of the secondary nodes must show the writes in the same order. If you change field "A" in a document and then change field "B", it is not possible to see that document with changed field "B" and not changed field "A". Eventually consistent systems allow you to see it, but MongoDB does not, and never has.

On secondary nodes, we apply writes in batches, because applying them sequentially would likely cause secondaries to fall behind the primary. When writes are applied in batches, we must block reads so that applications cannot see data applied in the "wrong" order. This is why when reading from secondaries, periodically the readers have to wait for replication batches to be applied. The heavier the write load, the more likely that your secondary reads will have these occasional "pauses", impacting your latency metrics. Given that applications frequently use secondary reads to reduce the latency of queries (for example when they use "nearest" readPreference) having to wait for replication batches to be applied defeats the goal of getting lowest latency on your reads.

In addition to readers having to wait for replication batch writes to finish, the writing of batches needs a lock that requires all reads to complete before it can be taken. That means that in the presence of high number of reads, the replication writes can start lagging – an issue that is compounded when chain replication is enabled.

What was our goal in MongoDB 4.0?

Our goal was to allow reads during oplog application to decrease read latency and secondary lag, and increase maximum throughput of the replica set. For replica sets with a high write load, not having to wait for readers between applying oplog batches allows for lower lag and quicker confirmation of majority writes, resulting in less cache pressure on the primary and better performance overall.

How did we do it?

Starting with MongoDB 4.0 we took advantage of the fact that we implemented support for timestamps in the storage engine, which allows transactions to get a consistent view of data at a specific "cluster time". For more details about this see the video: WiredTiger timestamps.

Secondary reads can now also take advantage of the snapshots, by reading from the latest consistent snapshot prior to the current replication batch that's being applied. Reading from that snapshot guarantees a consistent view of the data, and since applying current replication batch doesn't change these earlier records, we can now relax the replication lock and allow all these secondary reads at the same time the writes are happening.

How much difference does this make?

A lot! The range of performance improvements for throughput could range from none (if you were not impacted by the replication lock - that is your write load is relatively low) to 2X.

Most importantly, this improves latency for secondary reads – for those who use readPreference "nearest" because they want to reduce latency from the application to the database – this feature means their latency in the database will also be as low as possible. We saw significant improvement in 95 and 99th percentile latency in these tests.

Thread levels 8 16 32 64
Feature off 1 2 3 5
Feature on 0 1 1 0

95th percentile read latency (ms)

Best part of this new feature? You don't need to do anything to enable it or opt-into it. All secondary reads in 4.0 will read from snapshot without waiting for replication writes.

This is just one of a number of great new features coming in MongoDB 4.0. Take a look at our blog on the 4.0 release candidate to learn more. And don’t forget, you’ve still got time to register for MongoDB World where you can meet with the engineers who are building all of these great new features.

Introducing the Aggregation Pipeline Builder in MongoDB Compass

MongoDB Compass 1.14, including the aggregation pipeline builder, was released for general availability on June 26, 2018. Get Compass 1.14 from the Download Center.

Building MongoDB aggregations has never been so easy.

The most efficient way to analyze your data is where it already lives. That’s why we have MongoDB’s built-in aggregation framework. Have you tried it yet? If so, you know that it’s one of the most powerful MongoDB tools at your disposal. If not, you’re missing out on the ability to query your data in incredibly powerful ways. In fact, we like to say that “aggregate is the new find”. Built on the concept of data processing pipelines (like in Unix or PowerShell), the aggregation framework lets users “funnel” their documents through a multi-stage pipeline that filters, transforms, sorts, computes, aggregates your data, and more. The aggregation framework enables you to perform extensive analytics and statistical analysis in real time and generate pre-aggregated reports for dashboarding.

There are no limits to the number of stages an aggregation pipeline can have – pipelines can be as simple or as complex as you wish. In fact, the only limit is one’s imagination when it comes to deciding how to aggregate data. We’ve seen some very comprehensive pipelines!

With a rich library of over 25 stages and 100 operators (and growing with every release), the aggregation framework is an amazingly versatile tool. To help you be even more successful with it, we decided to build an aggregation construction user interface. The new Aggregation Pipeline Builder is now available with the latest release of Compass for beta testing. It’s available under the Aggregations tab.

The screenshot below depicts a sample pipeline on a movies collection that produces a listing of the title, year, and rating of all movies except for crime or horror, in English and Japanese which are rated either PG or G, starting with most recent, and sorted alphabetically within each year. Each stage was added gradually, with an ability to preview the result of our aggregation.

This easy-to-use UI lets you build your aggregation queries faster than ever before. There’s no need to worry about bracket matching, reordering stages, or remembering operator syntax with its intuitive drag-and-drop experience and code skeletons. You also get auto-completion for aggregation operators as well as query operators and even document field names.

If you need help understanding a particular operator, click on the info icon next to it and you’ll be taken directly to the appropriate guidance.

As you are building your pipeline, you can easily preview your results. This, in combination with an ability to rearrange and toggle stages on and off, makes it easy to troubleshoot your pipelines. When you are satisfied with the results, the constructed pipeline can be copied to the clipboard for easy pasting in your code, or simply saved in your favorites list for re-use later!

The aggregation authoring experience just got even more incredible with the new Compass aggregation pipeline builder. Why not check it out today?

  • Download the latest beta version of Compass
  • See the documentation for the aggregation pipeline builder in Compass
  • See the aggregation framework quick reference
  • To learn or brush up your aggregation framework skills, take M121 from our MongoDB University – it’s well worth it!

Also, please remember to send us your feedback by filing JIRA tickets or emailing it to: compass@mongodb.com.

MongoDB Enterprise Server for Pivotal Cloud Foundry goes GA

Jason Mimick
April 24, 2018
Technical

Last fall we launched MongoDB Enterprise Server for Pivotal Cloud Foundry (PCF) in beta. Earlier this year, we released v1.0.1, our first GA version of the integration. This week, we're happy to announce the next version, v1.0.3. In this post will outline all of the new post-beta features we've added to our PCF tile. For information on how the tile works and more background, please check out my colleague's excellent post.

New Features:

  • MongoDB 3.6 and MongoDB Ops Manager 3.6.2+ are now supported
  • Support for Organizations/Projects in MongoDB Ops Manager
  • Stemcell now follows all published MongoDB production best practices
  • Backups can now be enabled by default
  • T-Shirt sizing for PCF cluster VM specifications
  • TLS/SSL for MongoDB deployments using Bosh DNS

Speed: Devs and users care about getting features out quickly. They are reorganizing their teams to work on Strategic Initiatives like microservices. They need a database that supports traditional use cases as well as newer ones. These new features together with PCF help address these concerns. The MongoDB Enterprise Server for PCF Tile allows customers to leverage two powerful solutions for database management simultaneously to enhance their DevOps processes and accelerate development.

Standardization to mitigate risk: The PCF PaaS delivers features such as machine provisioning and initialization, and MongoDB Ops Manager provides runtime database management. Together these solutions allow modern enterprises the ability to ensure consistent configuration, security policies, backup, and monitoring are applied across the board to development, test, and production deployments.

Ease of use: This powerful combination additionally affords application developers the capability to deliver out-of-the-box cloud-ready solutions without needing to worry about complex infrastructure details. If you haven't already, download MongoDB Ops Manager and the latest MongoDB Enterprise Server for PCF Tile today!

MongoDB 3.6 and MongoDB Ops Manager 3.6.2+ are now supported

You must use MongoDB Ops Manager 3.6.2 or later for this version of the tile. The reason for this is the shift from a strict single level hierarchy of "Groups" in MongoDB Ops Manager to the more versatile "Organizations/Projects" structure. Refer to the Ops Manager documentation for more information.

Support for Organizations and Projects in MongoDB Ops Manager

The beta versions of the tile generated a cluster (group) name which wasn't very human-friendly. This version resolves this issue by allowing users to specify a cluster (project) name for their deployment at service provisioning time.

Stemcell now follows all published MongoDB production best practices

We have integrated all of the production best practices defined in the production notes. These best practices ensure that PCF tunes your MongoDB clusters with the optimal operating system settings for maximum performance and least risk.

Backups can now be enabled by default (v1.0.3)

Certain organizations enforce strict policies around backups due to regulatory restrictions. MongoDB Ops Manager offers the ability to backup any MongoDB deployment with point-in-time restores and queryable backups. This version of the tile supports the ability to enable backups by default for all MongoDB clusters. Alternatively, backups can be disabled by default and enabled for a cluster during service provisioning. Organizations no longer need to manually configure backups to ensure all deployments are always backed up for disaster recovery and governance scenarios.

Figure 1: MongoDB Enterprise Server Tile for PCF configuration in PCF Ops Manager

T-Shirt sizing for PCF cluster VM specifications (v1.0.3)

Beta versions of the tile only allowed for a single "VM type" (CPU/RAM/Disk) to be defined for all deployed MongoDB clusters. This version provides the ability to define three different "VM types". These types (Small, Medium, and Large) allow PCF operators to define categories of cluster types and then limit which types specific PCF users are allowed to deploy. The ability to support a wide variety of cluster types and sizes will enable an enterprise to support self-service scenarios and provide MongoDB as a service right out of the box.

TLS/SSL for MongoDB deployments using Bosh DNS

PCF Operators can now deploy MongoDB clusters with TLS/SSL enabled. The tile configuration in PCF Ops Manager now includes a new security tab in which one can enter the appropriate certificates and private PEM key files for database servers and Certificate Authorities. These will then be automatically distributed to each MongoDB server, deployed into known locations which can then easily be entered into the security settings for the corresponding MongoDB Ops Manager project for the deployment.

Conclusion

In our beta releases we focused on core functionality, and now with our GA and subsequent release, we have included some key usability features and other helpful additions. Please download and install the tile, wire it up to your MongoDB Ops Manager instance, and take it for a drive. To get started watch this demo video and refer to our documentation.

Also – check out a recent webinar we delivered with Pivotal for a hands-on look at using the MongoDB Enterprise Service Tile for PCF to refactor legacy monolith applications into microservices: How to Overcome Data Challenges When Refactoring Monoliths to Microservices.