Welcome to the final part of our 3-part MongoDB 3.0 blog series. In part 1 we covered the new MongoDB Ops Manager platform, demonstrating how it can reduce operational overhead by up to 95%. We also covered new security enhancements for authentication and auditing. In part 2 we covered what is new for developers and DBAs, as well as new flexibility in deploying large-scale, globally distributed clusters.
In this final part, we’ll cover the new pluggable storage architecture and what WiredTiger brings to MongoDB performance and storage efficiency.
Remember, if you want to get the detail now on everything MongoDB 3.0 offers, download the What’s New white paper.
Pluggable Storage Engines: Extending MongoDB to New Applications
With users building increasingly complex data-driven apps, there is no longer a "one size fits all" database storage technology capable of powering every type of application built by the business. Modern applications need to support a variety of workloads with different access patterns and price/performance profiles – from low latency, in-memory read and write applications, to real time analytics to highly compressed "active" archives.
Through the use of pluggable storage engines exposed by the new storage engine API, MongoDB can be extended with new capabilities, and configured for optimal use of specific hardware architectures. This approach significantly reduces developer and operational complexity compared to running multiple databases. Now users can leverage the same MongoDB query language, data model, scaling, security and operational tooling across different applications, each powered by different pluggable MongoDB storage engines.
Multiple storage engines can co-exist within a single MongoDB replica set, making it easy to evaluate and migrate engines. Running multiple storage engines within a replica set can also simplify the process of managing the data lifecycle. For example as different storage engines for MongoDB are developed, it would be possible to create a mixed replica set configured in such a way that:
- Operational data requiring low latency and high throughput performance is managed by replica set members using the WiredTiger or in-memory storage engine (currently experimental).
- Replica set members configured with an HDFS storage engine expose the operational data to analytical processes running in a Hadoop cluster, which is executing interactive or batch operations rather than real time queries.
MongoDB replication automatically migrates data between primary and secondary replica set members, independent of their underlying storage format. This eliminates complex ETL tools that have traditionally been used to manage data movement.
Figure 1: Mix and match storage engines within a single MongoDB replica set
MongoDB 3.0 ships with two supported storage engines:
- The default MMAPv1 engine, an improved version of the engine used in prior MongoDB releases, now enhanced with collection level concurrency control.
- The new WiredTiger storage engine. For many applications, WiredTiger's more granular concurrency control and native compression will provide significant benefits in the areas of lower storage costs, greater hardware utilization, higher throughput, and more predictable performance.
Both storage engines can co-exist in a single replica set, managed by MongoDB Ops Manager or the MongoDB Management Service (MMS), discussed in part of this blog series. MongoDB 3.0 also ships with an experimental In-Memory storage engine. Other engines under development by MongoDB and the community include the RocksDB Key-Value engine, HDFS storage engine and a FusionIO engine that bypasses the filesystem. These and other engines may be supported in the future, based on customer demand.
“MongoDB 3.0 enables a new dimension for community innovation with its pluggable storage engine feature. Every great product has a great customization story, and there is no greater customization than being able to choose the engine in your database. I look forward to the enhanced write performance and compression available with WiredTiger, and a flurry of new storage engine options for MongoDB, developed by the MongoDB community.”
MongoDB WiredTiger: A New Storage Engine for High Scale Apps
~Yuri Finkelstein, Enterprise Architect at eBay.
WiredTiger is a new storage engine for MongoDB, developed by the architects of Berkeley DB, the most widely deployed embedded data management software in the world. WiredTiger scales on modern, multi-CPU architectures. Using a variety of programming techniques such as hazard pointers, lock-free algorithms, fast latching and message passing, WiredTiger performs more work per CPU core than alternative engines. To minimize on-disk overhead and I/O, WiredTiger uses compact file formats, and optionally, compression.
For many applications, WiredTiger will provide significant benefits in the areas of lower storage costs, greater hardware utilization, and more predictable performance, especially by reducing query latency in 95th and 99th percentile.
Upgrades to the WiredTiger storage engine are non-disruptive for existing replica set deployments; applications will be 100% compatible, and upgrades can be performed with zero downtime through a rolling upgrade of the MongoDB replica set. This approach makes it very simple to migrate and test existing applications. Review the documentation
for a checklist and full instructions on the upgrade process.
Table 1: Comparing the MongoDB WiredTiger and MMAPv1 storage engines
The WiredTiger storage engine ships as part of MongoDB alongside the default MMAPv1 storage engine, and can be configured when starting the server using the following option:
mongod --storageEngine wiredTiger
Whether configured with the WiredTiger or MMAPv1 storage engines, developers and administrators interact with MongoDB in exactly the same way. The Performance Best Practices whitepaper
and the Operations Guide
detail specific optimizations that can be applied for each storage engine. The MongoDB storage documentation
highlights differences in journaling and record allocation strategies.
### Higher Performance & Efficiency
Between 7x and 10x Greater Write Performance
MongoDB 3.0 provides more granular document-level concurrency control, delivering between 7x and 10x greater throughput for most write-intensive applications, while maintaining predictable low latency.
The updated MongoDB MMAPv1 storage engine implements collection level concurrency control while the new MongoDB WiredTiger storage engine further improves performance for many workloads by implementing concurrency control at the document level.
Implementing concurrency control at the document level improves performance significantly when compared to the previous MongoDB 2.6 release. In each test, predictable low latency is maintained as the workload is scaled. Come back to our blog
where we will post benchmark results.
Migrating to the WiredTiger storage engine will deliver the most noticeable performance gains on highly write-intensive applications. Examples include:
- IoT applications: sensor data ingestion and analysis;
- Customer data management and social apps: updating all user interactions and engagement from multiple activity streams;
- Mobile applications: SMS, CDRs (Call Detail Records) and gaming.
“We at Parse and Facebook are incredibly excited for the 3.0 release of MongoDB. The storage API opens the door for MongoDB to leverage write-optimized storage engines, improved compression and memory usage, and other aspects of cutting edge modern database engines. We're excited for the release and can't wait to see further innovation in the MongoDB storage engine space."
~Charity Majors, Production Engineering Manager at Parse.
Higher concurrency also drives infrastructure simplification. Applications can fully utilize available server resources, simplifying the architecture needed to meet performance SLAs. With the more coarse grained database-level locking of previous MongoDB generations, users often had to implement sharding in order to scale workloads stalled by a single write lock to the database, even when sufficient memory, I/O bandwidth and disk capacity was still available in the host system. Greater system utilization enabled by fine-grained concurrency reduces this overhead, eliminating unnecessary cost and management load.
Compression: Up to 80% Reduction in Storage Costs
Despite data storage costs declining 30% to 40% per annum, overall storage expenses continue to escalate as data volumes double every 12 to 18 months. To make matters worse, improvements to storage bandwidth and latency are not keeping pace with data growth, making disk I/O a common bottleneck to scaling overall database performance.
MongoDB now supports native compression with the WiredTiger engine, reducing physical storage footprint by as much as 80% compared to MongoDB configured with MMAPv1. In addition to reduced storage space, compression enables much higher storage I/O scalability as fewer bits are read from disk.
Administrators have the flexibility to configure specific compression algorithms for collections, indexes and the journal, choosing between:
- Snappy (the default library for documents and the journal), providing a good balance between high compression ratio – typically around 70%, depending on document data types – and low CPU overhead.
- zlib, providing higher document and journal compression ratios for storage-intensive applications, at the expense of extra CPU overhead.
- Prefix compression for indexes reducing the in-memory footprint of index storage by around 50% (workload dependent), freeing up more of the working set for frequently accessed documents.
Administrators can modify the default compression settings for all collections and indexes. Compression is also configurable on a per-collection and per-index basis during collection and index creation.
By introducing compression, operations teams get higher performance per node and reduced storage costs.
Creating Multi-Temperature Storage
Combining compression with MongoDB’s location-aware sharding, administrators can build highly efficient tiered storage models to support the data lifecycle. Administrators can balance query latency with storage density and cost by assigning data sets to specific storage devices. For example, consider an application where recent data needs to be accessed quickly, while for older data, latency is less of a priority than storage costs:
- Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression enabled.
- Older, less frequently accessed data is tagged to higher capacity, lower-throughput hard disk drives where it is compressed with zlib to attain maximum storage density and lower cost-per-bit.
MongoDB will automatically migrate data between storage tiers based on user-defined policies without administrators having to build tools or ETL processes to manage data movement.
You can learn more about using location-aware sharding for this deployment model by reading the Tiered Storage Models in MongoDB post.
That wraps up our 3-part tour of MongoDB 3.0.
For a technical introduction to WiredTiger watch our presentation:
Watch a Technical Introduction to WiredTiger
<< Read Part 2