Welcome to part 2 of our 3-part MongoDB 3.0 blog series. In part 1 we covered the new MongoDB Ops Manager platform, demonstrating how it can reduce operational overhead by up to 95%. We also covered new security enhancements for authentication and auditing. In part 3, we’ll cover the new pluggable storage architecture and what WiredTiger brings to MongoDB performance and storage efficiency.
In this blog post, I’ll cover what is new for developers and DBAs, as well as new flexibility in deploying large-scale, globally distributed clusters.
Remember, if you want to get the detail now on everything MongoDB 3.0 offers, download the What’s New white paper.
Enhanced Query Language and Tools
Key MongoDB tools
mongooplog have been re-written as multi-threaded processes in Go, allowing faster operations and smaller binaries.
mongorestore now execute parallelized backup and recovery for small MongoDB instances. Dumps created by earlier releases can be restored to instances running MongoDB 3.0.
Note: As the only backup solutions that offer cluster-wide snapshots of sharded clusters, Ops Manager and MMS are recommended for larger MongoDB deployments. Review part 1 of this blog series to learn more.
mongoimport can parallelize loads across multiple collections with multi-threaded bulk inserts allowing for significantly faster imports of CSV, TSV and JSON data exported from other databases or applications. Ensuring data quality,
mongoimport now also supports input validation of field names during the import process.
You can read more about each of these tools in the MongoDB Package Components documentation.
Improved DBA Productivity: Enhanced Query Engine Introspection
explain() method is an invaluable tool for DBAs in optimizing performance. Using
explain() output, DBAs can review query plans, ensuring common queries are serviced by well-defined indexes, as well as eliminating any unnecessary indexes that can increase write latency and add overhead during query planning and optimization.
In the latest MongoDB 3.0 release
explain() has been significantly enhanced:
- The query plan can now be calculated and returned without first having to run the query. This enables DBAs to review which plan will be used to execute the query, without having to wait for the query to run to completion.
- DBAs can run `explain()` to generate detailed statistics on all query plans considered by the optimizer. Execution statistics are available for every evaluated plan, down to the granularity of execution stage. Now, for example, it is possible for the DBA to distinguish the amount of time a query plan spent sorting the result set from the amount of time spent reading index keys.
- The `explain()` method exposes query introspection to a wider range of operations, including find, count, update, remove, group, and aggregate, enabling DBAs to optimize for a wider range of query types.
Review the documentation to learn more about the new
Many organizations rely on MongoDB for real-time analytics. Buzzfeed are one such customer who are looking forward to what MongoDB 3.0 has to offer:
“MongoDB powers our ability to make sense of big data and improve our decisions. The introduction of storage engines and improved concurrency in MongoDB 3.0 are significant milestones and we are excited for what these new features will allow us to build."
Richer Geospatial Apps: Big Polygon Support for Multi-Hemisphere Queries
MongoDB’s geospatial indexes and queries are widely used by developers building modern location-aware applications across industries as diverse as high technology, retail, telecommunications and government. MongoDB 3.0 adds big polygon geospatial support with
$within operators, allowing execution of queries over geographic areas extending across multiple hemispheres and areas that exceed 50% of the earth’s surface. As an example, an airline can now run queries to identify all its aircraft that have traveled across multiple hemispheres in the past 24 hours.
Enhanced Data Type Support: Easier Time-Series Analytics & Reporting
The MongoDB 3.0 aggregation pipeline offers a new
$dateToString operator that simplifies report generation and grouping data by time interval. The operator formats the ISO Date type as a string with a user-supplied format, allowing developers to construct rich queries with less code.
Documents can be grouped and analyzed by arbitrary dates and times. This grouping is especially useful in the analysis of time series data – for example reporting aggregated sales by hour for each SKU in a specific store.
You can learn more from the
Faster Issue Resolution: Enhanced Logging Log analysis is a critical part of identifying issues and determining root cause. Now in MongoDB 3.0 developers, QA and operations staff have much greater control over the granularity of log messages and specific functional areas of the server to more precisely investigate issues.
Users can configure which specific components of the database should be exposed for higher definition logging, coupled with the addition of severity levels for each log message. For example:
- If developers are diagnosing specific query issues, verbose logging can be enabled just for query and indexing operations.
- If the operations team are debugging network performance issues, they could activate deeper logging levels for replication, networking and sharding components of the `mongod` process.
With the selective configuration of logging verbosity by server component, QA staff can expose more granular details of specific MongoDB internals without overwhelming either the systems or IT staff with extraneous log data. This coupled with more parsable logging output helps staff more quickly identify and resolve issues, whether in development, QA or production.
You can read more from the MongoDB log messages documentation.
Deploying Geo-Distributed, Datacenter-Aware Applications
Delivering a low latency experience to customers wherever they are located is a key design consideration for distributed systems. Using MongoDB’s native replica sets, copies (replicas) of the database can be deployed to sites physically closer to users, thereby reducing the effects of network latency. Reads can be issued with the
nearest read preference, ensuring the query is served from the replica closest to the user, based on ping distance.
Previously MongoDB supported a maximum of 12 members per replica set, which limited deployments beyond more than three or four remote offices (depending on the number of replica set members deployed per location).
MongoDB 3.0 now supports up to 50 members per replica set, enabling wider data distribution across a greater number of sites, and greater resilience through additional node redundancy at each location. As with earlier releases, seven members are eligible to vote in replica set elections.
In addition to broader data distribution, replica sets can also be configured with a greater number of members performing specialized tasks:
- More hidden replica set members can be deployed to run applications such as analytics and reporting that require isolation from regular operational workloads.
- More delayed replica set members can be deployed to provide "historical" snapshots of data at different intervals in time for use in recovery from certain errors, such as "fat-finger" mistakes dropping databases or collections.
And of course, you can configure and deploy multi-region MongoDB replica sets with Ops Manager and MMS discussed in part 1 of this blog series.
That wraps up the 2nd installment in our 3-part MongoDB 3.0 blog series. If you’re considering updating your version of MongoDB, take a look at our Major Version Upgrade consulting services: