What’s New in MongoDB 3.4, Part 1: Multimodel Done Right

Mat Keep

#Technical#Company

MongoDB 3.4 is now Generally Available (GA) and ready for production deployment!

MongoDB 3.4 is the latest release of the industry’s fastest growing database. It offers a major evolution in capabilities and enhancements that enable you to address emerging opportunities and use cases. This 3-part blog series aims to help you navigate everything that is new, and provides the most important resources to get you started:

  • Part 1 covers the extended multimodel capabilities of MongoDB 3.4, including native graph processing, faceted navigation, rich real-time analytics, and powerful connectors for BI and Apache Spark.
  • Part 2 discusses enhanced capabilities to support mission-critical applications, including geo-distributed MongoDB zones, elastic clustering, tunable consistency, and enhanced security controls.
  • Part 3 concludes with modernized DBA and Ops tooling available in the new release.

If you want to get the detail now on everything the new release offers, download the What’s New in MongoDB 3.4 white paper.

Why Multimodel?

Rather than the monolithic codebases of the past, today’s applications are increasingly being decomposed into loosely coupled suites of microservices, each implementing specific functionality within an application. Different services can place very different demands on the database used – from simple key-value lookups to complex analytics, aggregations, and graph traversals, through to rich search queries. Some data may need to be stored only in-memory for predictable low latency, while other data sets may need to be encrypted on disk for regulatory compliance. Data sets may vary from billions of small records, each just several KBs in size, to the management of large, multi-MB objects.

To try and tame the complexity that would come from using a multitude of storage technologies, the industry is moving towards the concept of “multimodel” databases. Such designs are based on the premise of presenting multiple data models within the same platform, thereby serving diverse application requirements. However, many self-described multimodel database are little more than a collection of discrete technologies for data storage, search, and analytics, each with its own domain specific language, API and deployment requirements, and working on its own copy of the data. This approach to multimodel fails to offer much of an improvement over just running multiple independent databases, imposing high complexity, overhead, friction, and cost for developers and operations teams.

MongoDB Takes a Different Approach

How does MongoDB differ?

  • MongoDB’s flexible document data model presents a superset of other database models. It allows data be represented as simple key-value pairs and flat, table-like structures, through to rich documents and objects with deeply nested arrays and sub-documents.
  • With an expressive query language, documents can be queried in many ways – from simple lookups to creating sophisticated processing pipelines for data analytics and transformations, through to faceted search, JOINs and graph traversals.
  • With a flexible storage architecture, application owners can deploy storage engines optimized for different workload and operational requirements.

MongoDB’s approach to delivering multimodel significantly reduces developer and operational complexity, compared to running multiple, separate technologies to satisfy diverse application requirements. Users can leverage the same MongoDB query language, data model, scaling, security, and operational tooling across different applications, all within a single, integrated database platform.

MongoDB 3.4 introduces new native graph processing, faceted navigation, multi-language collations, additional aggregation pipeline operators, a new decimal data type, along with enhanced connectors for BI and Apache Spark integration.

Native Graph Processing

Applications storing data in MongoDB frequently contain data that represents graph or tree type hierarchies. These connections can be as simple as a management reporting chain in a HR application, or as complex as multi-directional, deeply nested relationships maintained by social networks, master data management, recommendation engines, disease taxonomy, fraud detection, and more. While special purpose graph databases are effective at storing and querying graph data, it’s often desirable to store and traverse graph data directly in MongoDB. Here it can be processed, queried, and analyzed alongside all other operational data in real time, without the complexity of duplicating data across two separate databases.

Graph and hierarchical data is commonly queried to uncover indirect or transitive relationships. For example, if company “A” is owned by company “B”, and “B” is owned by company “C”, then “C” indirectly owns company “A”. MongoDB 3.4 offers this functionality via a new aggregation stage called $graphLookup to recursively lookup a set of documents with a specific defined relationship to a starting document. Developers can specify the maximum depth for the recursion, and apply additional filters to only search nodes that meet specific query predicates. $graphLookup can recursively query within a single collection, or across multiple collections.

Review the documentation to learn more about the MongoDB $graphLookup operator for graph processing.

Faceted Navigation

Faceting is a popular analytics and search capability that allows an application to group information into related categories by applying multiple filters to query results. Facets allow users to narrow their search results by selecting a facet value as a filter criteria. Facets also provide an intuitive interface to exploring a data set, and allow convenient navigation to data that is of most interest.

Most databases need to execute multiple GROUP_BY statements to render facets, resulting in long running queries and poor user experience. MongoDB 3.4 introduces new aggregation pipeline stages for the bucketing, grouping and counting of one or more facets in a single round trip to the database. As a result, developers can generate richer and intuitive experiences to help users navigate complex data sets. Review the documentation to learn more about MongoDB faceted navigation.

Multi-Language Collations

Applications addressing global audiences require handling content that spans many languages. Each language has different rules governing the comparison and sorting of data. In order to create intuitive, localized user experiences, applications must handle non-English text with the appropriate rules for that language. For example, French has detailed rules for sorting names with accents on them. German phonebooks order words differently than the German dictionary.

MongoDB 3.4 significantly expands language support capabilities to allow users to build applications that adhere to language-specific comparison rules. Support for collations – the rules governing text comparisons and sorting – has been added throughout the MongoDB Query Language and indexes for over 100 different languages and locales. Each collation can also be further customized to provide precise control over case sensitivity, numeric ordering, whitespace handling, and more.

Developers can specify collation for a collection, an index, a view, or for specific operations that support collation (i.e. find, aggregate, update). You can learn more about collation in MongoDB from the documentation.

Aggregation Pipeline Enhancements

MongoDB developers and data engineers rely on the aggregation pipeline due to its power and flexibility in enabling sophisticated processing and manipulation demanded by real-time analytics and data transformations. MongoDB 3.4 continues to extend the aggregation pipeline by adding new capabilities within the database that simplify application-side code, as well as optimizer enhancements that improve performance.

In addition to the graph and facet features described earlier, many other expressions are added in MongoDB 3.4. These expressions address string manipulation, array handling, type handling, and schema detection and transformation:

  • String handling includes expressions for splitting and manipulating strings either in bytes or code points (a code point can represent a single component of the string, e.g, a character, emoji, or formatting instruction).
  • Array expressions allow more sophisticated manipulation and computations on arrays, including parallel array processing.
  • New expressions allow determining types of fields
  • Case/switch expressions for branching
  • Support for ISO week expressions

MongoDB 3.4 also brings additional performance optimizations to the aggregation pipeline. Where possible, the query optimizer automatically moves the $match stage earlier in the pipeline, and combines it with other stages, to increase cases where indexes can be used to filter results sets. In most cases, no modifications of existing queries need to be made.

You can learn more about the many MongoDB 3.4 aggregation pipeline enhancements from the documentation.

Decimal Data Type

Decimal128 is a 16 byte decimal floating-point number format. It is intended for calculations on decimal numbers where high levels of precision are required, such as financial (i.e. tax calculations, currency conversions) and scientific computations. Decimal128 supports 34 decimal digits of significance and an exponent range of −6143 to +6144.

MongoDB 3.4 adds support for the decimal data type which represents decimal128 values. Unlike the double data type, which only stores approximations of decimal values, the decimal data type stores the exact value. For example, a decimal type ("9.99") has a precise value of 9.99, while 9.99 represented as a double would have an actual value of 9.99000000000000021316282072803, thus creating the potential for rounding errors when it is used in calculations.

Decimal type values are treated like any other numeric type, and compare and sort correctly with other types based on actual numeric value. Operations on decimals are implemented in accordance with the decimal128 standard, so a value of 0.10 will retain its trailing zeros while comparing equal to 0.1, 0.10000 and so on.

Review the documentation to learn more about the new MongoDB decimal data type.

Visualizing MongoDB Data

The MongoDB Connector for BI was introduced in November 2015. For the first time analysts, data scientists, and business users were able to seamlessly visualize semi-structured and unstructured data managed in MongoDB, alongside traditional data from their SQL databases, using the same BI tools deployed within millions of enterprises.

Building on its initial release, the Connector for BI has been reengineered to improve performance, simplify installation and configuration, and support Windows.

Figure 1: Uncover new insights with powerful visualizations generated from MongoDB

Performance and scalability has been improved by moving more query execution down to the MongoDB processes themselves. Queries and complex aggregations are executed natively within the database, thus reducing latency and bandwidth consumption. In addition, installation and authentication has been simplified. Users now authenticate as an existing user already declared within MongoDB, no longer needing to create separate username and password credentials within the connector.

The Connector for BI is part of the Advanced Analytics suite available with MongoDB Enterprise Advanced. Review the MongoDB Connector for BI documentation to learn more.

MongoDB Connector for Apache Spark

Following general availability in June 2016, the MongoDB Connector for Apache Spark has been updated to support the latest Spark 2.0 release. Spark 2.0 support in the connector provides access to the new SparkSession entry point, the unified DataFrame and Dataset API, enhanced SparkSQL and SparkR functionality, and the experimental Structured Streaming feature. The connector exposes all of Spark’s libraries, including Scala, Java, Python, and R. MongoDB data is materialized as DataFrames and Datasets for analysis through machine learning, graph, streaming, and SQL APIs.

Already powering sophisticated analytics at organizations including China Eastern Airlines, Black Swan, and x.ai, the MongoDB Connector for Apache Spark takes advantage of MongoDB’s aggregation framework, rich queries, and secondary indexes to extract, filter, and process only the range of data it needs – for example, all customers located in a specific geography. To maximize performance across large, distributed data sets, the connector can co-locate Resilient Distributed Datasets (RDDs) with the source MongoDB node, thereby minimizing data movement across the cluster and reducing latency.

You can download the MongoDB Connector for Apache Spark from GitHub, and sign up for a free Spark course from MongoDB University.

Next Steps

That wraps up the first part of our 3-part blog series. Remember, you can get the detail now on everything packed into the new release by downloading the What’s New in MongoDB 3.4 white paper.

Alternatively, if you’d had enough of reading about it and want to get started now, then: