MongoDB Pluggable Storage Engines: State of the Union, Storage Engine Summit

Mat Keep


Back at the inaugural MongoDB World user conference in June 2014, our co-founder and CTO, Eliot Horowitz, announced that MongoDB would incorporate a storage engine API into the next major version of the database – MongoDB 3.0. The goal of the storage engine API was to make it fast and easy for MongoDB and the community to build new pluggable storage engines that allow the database to be extended with new capabilities, and to be configured for specific hardware architectures.

12 months on, following both the MongoDB 3.0 release and the MongoDB World 2015 conference (keynotes and sessions from the event are now online), we hosted our first Storage Engine Summit, bringing together storage engine developers from across both MongoDB and the community. The summit’s goals were to:

  1. Review the status of current MongoDB storage engines
  2. Provide developers with visibility into the roadmap for both the storage engine API itself, and the database roadmap
  3. Collect storage engine developer’s feature requirements
  4. Develop best practices for community collaboration and development of the API

In this blog, I’ll provide an overview of the progress happening around MongoDB storage engines, and key outcomes from the summit.

Why Pluggable Storage Engines?

First, a bit of background. With users building increasingly complex data-driven apps, there is no longer a "one size fits all" database storage technology capable of powering every type of application built by the business. Modern applications need to support a variety of workloads with different access patterns and price/performance profiles – from low latency, in-memory read and write applications, to real-time analytics, to highly compressed "active" archives.

The storage engine API allows MongoDB to be configured with a choice of storage engines, each configured for specific workloads. This “pluggable” approach significantly reduces developer and operational complexity compared to running multiple databases. Now users can leverage the same MongoDB query language, data model, scaling, security and operational tooling across different applications, each powered by pluggable MongoDB storage engines that are optimized for specific workloads.

So what progress has been made so far?

Quite a bit, and in a very short time frame.

MongoDB 3.0 shipped with two supported storage engines:

  • The default MMAPv1 engine, an improved version of the engine used in prior MongoDB releases, enhanced with collection level concurrency control.
  • The new WiredTiger storage engine. For many applications, WiredTiger's more granular concurrency control and native compression provide significant benefits in terms of lower storage costs, greater hardware utilization, higher throughput, and more predictable performance. Benchmarks show MongoDB 3.0 configured with WiredTiger delivers 7-10x higher performance than MongoDB 2.6 using the original MMAP storage engine. In addition, storage compression rates of up to 80% are fairly common.

To demonstrate how developers could start to innovate, several experimental engines also shipped with MongoDB 3.0, including the in-memory and /dev/null storage engines.

Beyond the walls of MongoDB engineering teams, the community also started releasing its own engines. Within one quarter of 3.0’s availability:

  • Facebook released mongo-rocks, a MongoDB storage engine based on its RocksDB embedded database project. Mongo-rocks is already in use for some production workloads at Facebook’s Parse mobile backend-as-a-service division, which today supports over 500,000 applications on MongoDB.
  • Percona published a release candidate of its TokuMXse storage engine.

What’s Next?

Held in our New York offices on June 4th, the summit drew diverse representation from the community. Some attendees already had storage engines for MongoDB; some were in varying stages of development; while others were just starting to evaluate the opportunities. Attendees included:

  • Facebook, who already has RocksDB
  • Percona, readying TokuMXse
  • SanDisk, developing an engine optimized for its Fusion-io SSD storage devices
  • Deep, who have developed a MySQL storage engine featuring an adaptive database kernel
  • 8k Data, currently building a JSON database on top of Postgres

There was also representation from MongoDB’s storage engine teams, who discussed additional engines planned for the MongoDB 3.2 timeframe, including:

  • An encrypted storage engine protecting data at-rest, with keys secured by the industry standard Key Management Interoperability Protocol (KMIP). At-rest encryption is especially critical for regulated industries such as healthcare, financial services, retailers and certain government agencies. And with so many high profile breaches of sensitive data at high profile companies over the past five years, increasingly all data is being encrypted.
  • An in-memory engine designed to serve ultra high-throughput, low-latency apps typical in finance, ad-tech, gaming, real-time analytics, session management and general cache use cases.
  • A new option for insert only workloads (e.g., streaming IoT sensor data, log file analysis, social media feed ingestion), based on the LSM option in the WiredTiger storage engine.

Much more than just sharing feeds and speeds, the intent of the storage engine summit was to bring together developers to share best practices in how to collaborate:

  • Mark Callaghan, one of our esteemed MongoDB masters and member of Facebook’s technical staff, gave great insight into his experiences working as part of the MySQL storage engine community. He shared the good, the bad, and the downright ugly.
  • Dr Michael Cahill, co-founder of WiredTiger, and now Director of Storage Engineering at MongoDB, shared his experiences implementing the WiredTiger storage engine, both while outside MongoDB, and within the company.
  • Mathias Stearn & Geert Bosch, senior database kernel engineers, provided a deep dive walkthrough into the storage engine API implementation, and plans for the future. This gave all those present an opportunity to provide feedback on issues they had faced, solutions to move forward, and future requirements to shape the evolution of the API

All of the talks gave a great frame-of-reference in how the MongoDB storage engine community should move forward. So what did the summit achieve in one short day?

  • The formation of a focused community committed to building MongoDB storage engines.
  • The establishment of a framework for communications, contributions, collaboration & future summits.
  • It connected engine developers directly with MongoDB engineers designing & implementing API functionality. Not only has feedback already helped evolve the storage engine API in the current 3.0 release, additional enhancements will be reflected in MongoDB 3.2’s API.

Wrapping Up

Huge progress has been made, and the MongoDB storage engine ecosystem is continuing to grow. This diversity helps MongoDB users solve new classes of use cases with a single database framework. The storage engine API helps developers innovate faster. And it helps storage engine partners get their technology in front of a whole new audience.

To learn more of the promise of multiple storage engines, download the What’s New in MongoDB 3.0 guide.
What’s New in MongoDB 3.0