What’s New in MongoDB 3.6. Part 2 – Speed to Scale

Mat Keep
December 14, 2017
MongoDB 3.6

Welcome to part 2 of our MongoDB 3.6 blog series.

  • In part 1 we took a look at the new capabilities designed specifically to help developers build apps faster, including change streams, retryable writes, developer tools, and fully expressive array manipulation
  • In part 2, we’ll dive into the world of DevOps and distributed systems management, exploring Ops Manager, schema governance, and compression
  • Part 3 will cover what’s new for developers, data scientists, and business analysts with the new SQL-based Connector for BI, richer in-database analytics and aggregations, and the new recommended driver for R
  • In our final part 4, we’ll look at all of the new goodness in our MongoDB Atlas fully managed database service available on AWS, Azure, and GCP, including Cross-region replication for globally distributed clusters, auto-scaling, and more.

If you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Speed to Scale

Unlike the traditional scale-up systems of the past, distributed systems enable applications to scale further and faster while maintaining continuous availability in the face of outages and maintenance. However, they can impose more complexity on the ops team, potentially slowing down the pace of delivering, scaling, and securing apps in production.

MongoDB 3.6 takes another important step in making it easier for operations teams to deploy and run massively scalable, always-on global applications that benefit from the power of a distributed systems architecture.

Ops Manager

MongoDB Ops Manager is the best way to run MongoDB on your own infrastructure, making operations staff 10x-20x more productive. Advanced management and administration delivered with Ops Manager 3.6 allow operations teams to manage, optimize, and backup distributed MongoDB clusters faster and at higher scale than ever before. Deeper operational visibility allows proactive database management, while streamlined backups reduce the costs and time of data protection.

Simplified Monitoring and Management

It is now easier than ever for administrators to synthesize schema design against real-time database telemetry and receive prescriptive recommendations to optimize database performance and utilization – all from a single pane of glass.

Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

Figure 1: Ops Manager performance telemetry and prescriptive recommendations speeds time to scale

  • The Data Explorer allows operations teams to examine the database’s schema by running queries to review document structure, viewing collection metadata, and inspecting index usage statistics, directly within the Ops Manager UI.
  • The Real Time Performance Panel provides insight from live server telemetry, enabling issues to be immediately identified and diagnosed. The panel displays all operations in flight, network I/O, memory consumption, the hottest collections, and slowest queries. Administrators also have the power to kill long running operations from the UI.
  • The new Performance Advisor, available for both Ops Manager and MongoDB Atlas, continuously highlights slow-running queries and provides intelligent index recommendations to improve performance. Using Ops Manager automation, the administrator can then roll out the recommended indexes automatically, without incurring any application downtime.

Ops Manager Organizations To simplify management of global MongoDB estates, Ops Manager now provides a new Organizations and Projects hierarchy. Previously Projects, formerly called “groups”, were managed as individual entities. Now multiple Projects can be placed under a single organization, allowing operations teams to centrally view and administer all Projects under the organization hierarchy. Projects can be assigned tags, such as a “production” tag, against which global alerting policies can be configured.

Faster, Cheaper and Queryable Backups

Ops Manager continuously maintains backups of your data, so if an application issue, infrastructure failure, or user error compromises your data, the most recent backup is only moments behind, minimizing exposure to data loss. Ops Manager offers point-in-time backups of replica sets, and cluster-wide snapshots of sharded clusters, guaranteeing consistency and no data loss. You can restore to precisely the moment you need, quickly and safely. Ops Manager backups are enhanced with a range of new features:

  • Queryable Backups, first introduced in MongoDB Atlas, allow partial restores of selected data, and the ability to query a backup file in-place, without having to restore it. Now users can query the historical state of the database to track data and schema modifications – a common demand of regulatory reporting. Directly querying backups also enables administrators to identify the best point in time to restore a system by comparing data from multiple snapshots, thereby improving both RTO and RPO. No other non-relational database offers the ability to query backups in place.
  • The Ops Manager 3.6 backup agent has been updated to use a faster and more robust initial sync process. Now, transient network errors will not cause the initial sync to restart from the beginning of the backup process, but rather resume from the point the error occurred. In addition, refactoring of the agent will speed data transfer from MongoDB to the backup repository, with the performance gain dependent on document size and complexity.
  • Reducing backup storage overhead by 1x of your logical production data and further improving speed to recovery, Point-in-Time snapshots will now be created at the destination node for the restore operation, rather than at the backup server, therefore reducing network hops. The restore process now transfers backup snapshots directly to the destination node, and then applies the oplog locally, rather than applying it at the daemon server first and then pushing the complete restore image across the network. Note that this enhancement does not apply to restores via SCP.
  • Extending support for the AWS S3 object store, backups can now be routed to on-premises object stores such as EMC ECS or IBM Cleversafe. MongoDB’s backup integration provides administrators with greater choice in selecting the backup storage architecture that best meets specific organizational requirements for data protection. It enables them to take advantage of cheap, durable, and quickly growing object storage used within the enterprise. By limiting backups to filesystems or S3 only, most other databases fail to match the storage flexibility offered by MongoDB.
  • With cross-project restores, users can now perform restores into a different Ops Manager Project than the backup snapshot source. This allows DevOps teams to easily execute tasks such as creating multiple staging or test environments that match recent production data, while configured with different user access privileges or running in different regions.

Review the Ops Manager documentation to learn more.

Schema Validation

MongoDB 3.6 introduces Schema Validation via syntax derived from the proposed IETF JSON Schema standard. This new schema governance feature extends the capabilities of document validation, originally introduced in MongoDB 3.2.

While MongoDB’s flexible schema is a powerful feature for many users, there are situations where strict guarantees on data structure and content are required. MongoDB’s existing document validation controls can be used to require that any documents inserted or updated follow a set of validation rules, expressed using MongoDB query syntax. While this allows for the definition of required content for each document, it had no mechanism to restrict users from adding documents containing fields beyond those specified in the validation rules. In addition, there is no way for administrators to specify and enforce control over the complete structure of documents, including data nested inside arrays.

Using schema validation, DevOps and DBA teams can now define a prescribed document structure for each collection, which can reject any documents that do not conform to it. With schema validation, MongoDB enforces controls over JSON data that are unmatched by any other database:

  • Complete schema governance. Administrators can define when additional fields are allowed to be added to a document, and specify a schema on array elements including nested arrays.
  • Tunable controls. Administrators have the flexibility to tune schema validation according to use case – for example, if a document fails to comply with the defined structure, it can be either be rejected, or still written to the collection while logging a warning message. Structure can be imposed on just a subset of fields – for example requiring a valid customer a name and address, while others fields can be freeform, such as social media handle and cellphone number. And of course, validation can be turned off entirely, allowing complete schema flexibility, which is especially useful during the development phase of the application.
  • Queryable. The schema definition can be used by any query to inspect document structure and content. For example, DBAs can identify all documents that do not conform to a prescribed schema.

With schema validation, developers and operations teams have complete control over balancing the agility and flexibility that comes from a dynamic schema, with strict data governance controls enforced across entire collections. As a result, they spend less time defining data quality controls in their applications, and instead delegate these tasks to the database. Specific benefits of schema validation include:

  1. Simplified application logic. Guarantees on the presence, content, and data types of fields eliminates the need to implement extensive error handling in the application. In addition, the need to enforce a schema through application code, or via a middleware layer such as an Object Document Mapper, is removed.
  2. Enforces control. Database clients can no longer compromise the integrity of a collection by inserting or updating data with incorrect field names or data types, or adding new attributes that have not been previously approved.
  3. Supports compliance. In some regulated industries and applications, it is required that Data Protection Officers demonstrate that data is stored in a specific format, and that no additional attributes have been added. For example, the EU’s General Data Protection Regulation (GDPR) requires an impact assessment against all Personally Identifiable Information (PII), prior to any processing taking place.

Extending Security Controls

MongoDB offers among the most extensive and mature security capabilities of any modern database, providing robust access controls, end-to-end data encryption, and complete database auditing. MongoDB 3.6 continues to build out security protection with two new enhancements that specifically reduce the risk of unsecured MongoDB instances being unintentionally deployed into production.

From the MongoDB 2.6 release onwards, the binaries from the official MongoDB RPM and DEB packages bind to localhost by default. With MongoDB 3.6, this default behavior is extended to all MongoDB packages across all platforms. As a result, all networked connections to the database will be denied unless explicitly configured by an administrator. Review the documentation to learn more about the changes introduced by localhost binding. Combined with new IP whitelisting, administrators can configure MongoDB to only accept external connections from approved IP addresses or CIDR ranges that have been explicitly added to the whitelist.

End-to-End Compression

Adding to intra-cluster network compression released in MongoDB 3.4, the new 3.6 release adds wire protocol compression to network traffic between the client and the database.

Creating highly efficient distributed systems with end to end compression Figure 2: Creating highly efficient distributed systems with end to end compression

Wire protocol compression can be configured with the snappy or zLib algorithms, allowing up to 80% savings in network bandwidth. This reduction brings major performance gains to busy network environments and reduces connectivity costs, especially in public cloud environments, or when connecting remote assets such as IoT devices and gateways.

With compression configurable across the stack – for client traffic, intra-cluster communications, indexes, and disk storage – MongoDB offers greater network, memory, and storage efficiency than almost any other database.

Enhanced Operational Management in Multi-Tenant Environments

Many MongoDB customers have built out their database clusters to serve multiple applications and tenants. MongoDB 3.6 introduces two new features that simplify management and enhance scalability:

Operational session management enables operations teams to more easily inspect, monitor, and control each user session running in the database. They can view, group, and search user sessions across every node in the cluster, and respond to performance issues in real time. For example, if a user or developer error is causing runaway queries, administrators now have the fine-grained operational oversight to view and terminate that session by removing all associated session state across a sharded cluster in a single operation. This is especially useful for multi-tenant MongoDB clusters running diverse workloads, providing a much simpler interface for identifying active operations in the database cluster, recovering from cluster overloads, and monitoring active users on a system. Review the sessions commands documentation to learn more.

Improved scalability with the WiredTiger storage engine to better support common MongoDB use cases that create hundreds of thousands of collections per database, for example:

  • Multi-tenant SaaS-based services that create a collection for each user.
  • IoT applications that write all sensor data ingested over an hour or a day into a unique collection.

As the collection count increased, MongoDB performance could, in extreme cases, degrade as the WiredTiger session cache managing a cursor’s access to collections and indexes became oversubscribed. MongoDB 3.6 introduces a refactoring of the session cache from a list to hash table, with improved cache eviction policies and checkpointing algorithms, along with higher concurrency by replacing mutexes with Read/Write locks. As a result of this refactoring, a single MongoDB instance running with the WiredTiger storage engine can support over 1 million collections. Michael Cahill, director of Storage Engineering, presented a session on the development work at the MongoDB World ‘17 customer conference. Review the session slides to learn more.

Next Steps

That wraps up the second part of our what’s new blog series. Remember, if you want to get the detail now on everything the new release offers, download the Guide to what’s New in MongoDB 3.6.

Alternatively, if you’d had enough of reading about it and want to get started now, then: