We are happy to announce that the MongoDB Connector for Apache Spark is now officially certified for Microsoft Azure Databricks. MongoDB Atlas users can integrate Spark and MongoDB in the cloud for advanced analytics and machine learning workloads by using the MongoDB Connector for Apache Spark which is fully supported and maintained by MongoDB.
The MongoDB Connector for Apache Spark exposes all of Spark’s libraries, including Scala, Java, Python, and R. MongoDB data is materialized as DataFrames and Datasets for analysis with machine learning, graph, streaming, and SQL APIs. The MongoDB Connector for Apache Spark can take advantage of MongoDB’s aggregation pipeline and rich secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography. This is very different from simple NoSQL data stores that do not offer secondary indexes or in-database aggregations and require the extraction of all data based on a simple primary key, even if only a subset of that data is needed for the Spark process. This results in more processing overhead, more hardware, and longer time-to-insight for data scientists and engineers.
Additionally, MongoDB’s workload isolation makes it easy for users to efficiently process data drawn from multiple sources into a single database with zero impact on other business-critical database operations. Running Spark on MongoDB reduces operational overhead as well by greatly simplifying your architecture and increasing the speed at which analytics can be executed.
MongoDB Atlas, our on-demand, fully-managed cloud database service for MongoDB, makes it even easier to run sophisticated analytics processing by eliminating the operational overhead of managing database clusters directly. By combining Azure Databricks and MongoDB, Atlas users can make benefit of a fully managed analytics platform, freeing engineering resources to focus on their core business domain and deliver actionable insights quickly.