MongoDB Enables Advanced Real-Time Analytics on Fast Moving Data with New Connector for Apache Spark

Company’s Database Recognized as Application Certified by Databricks

NEW YORK, NY, MongoDB World – June 28, 2016 – MongoDB, the database for giant ideas, today announced MongoDB Connector for Apache Spark, a powerful integration that enables developers and data scientists to create new insights and drive real-time action on live, operational, and streaming data. The MongoDB Connector for Apache Spark is now generally available and ready for production usage.

Working closely with Databricks, the company founded by the team that created the Apache Spark project, the MongoDB Connector has received Databricks Certified Application status for Spark. The certification means that developers can focus on building modern, data driven applications, knowing that the connector provides seamless integration and complete API compatibility between Spark processes and MongoDB.

“Combining Apache Spark, the leading open-source big data analytics processing engine in the Apache Software Foundation, with MongoDB, the industry’s fastest-growing database, enables organizations to fully realize the potential of real-time analytics,” said Eliot Horowitz, co-founder and CTO of MongoDB. “Spark jobs can be executed directly against operational data managed by MongoDB, without the time and expense of Extract Transform Load (ETL) processes. MongoDB can efficiently index and serve analytics results back into live, operational processes, making them smarter, more contextual and responsive to events as they happen.”

Delivering Faster, Lower Cost Performance for Advanced Analytics
The connector enables developers to build more functional applications faster and with less complexity, using a single integrated analytics and database technology stack. With industry estimates assessing that data integration consumes 80 percent of analytics development, the connector enables data engineers to eliminate the requirement for shuttling data between separate operational and analytics infrastructure. Each of these systems demands their unique configuration, maintenance and management requirements.

“Users are already combining Apache Spark and MongoDB to build sophisticated analytics applications. The new native MongoDB Connector for Apache Spark provides higher performance, greater ease of use, and access to more advanced Apache Spark functionality than any MongoDB connector available today,” said Reynold Xin, co-founder and chief architect of Databricks.

Written in Scala, Apache Spark’s native language, the connector provides a more natural development experience for Spark users. The connector exposes all of Spark’s libraries, enabling MongoDB data to be materialized as DataFrames and Datasets for analysis with machine learning, graph, streaming and SQL APIs, further benefiting from automatic schema inference.

The connector also takes advantage of MongoDB’s aggregation pipeline and rich secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography.

To maximize performance across large, distributed data sets, the MongoDB Connector for Apache Spark can co-locate Resilient Distributed Datasets (RDDs) with the source MongoDB node, thereby minimizing data movement across the cluster and reducing latency.

Users Eager to Realize Potential of Real-Time Analytics with MongoDB
“Building an artificial intelligence (AI) application requires huge amounts of data to be processed at once, both reliably and efficiently,” said Jeff Smith, Data Engineering Team Lead,, and author of Reactive Machine Learning Systems. “To store all that data, we use MongoDB for its flexible data model and its scaling capabilities. And to process all of that data to build machine learning models, we build robust pipelines in Scala using the distributed data processing capabilities of Apache Spark. Now, with the new native MongoDB Connector for Apache Spark, we have an even better way of connecting up these two key pieces of our infrastructure. We're rapidly building out a personal assistant who schedules meetings nearly flawlessly, and our datasets are increasing at an exponential rate. We believe the new connector will help us move faster and build reliable machine learning systems that can operate at massive scale.”

Users can get started learning how to leverage the new connector with a free MongoDB University Course, Getting Started with Spark and MongoDB.


About MongoDB
By offering the best of traditional databases as well as the flexibility, scale and performance today’s applications require, we let innovators​ ​deploy apps as big as they can possibly dream. From startups to enterprises, for the modern and the​ ​mission-critical, MongoDB is the database for giant ideas.​ For more information, visit