Apache Spark MongoDB Integration

The world is awash in data but data is only as valuable as the insights it can bring. But coming up with the right mix of technology to make these insights happen can be a challenge. One answer that companies are increasingly turning is the integration between Apache Spark and MongoDB for real-time analytics.

Apache Spark is a popular framework for real-time analytics that offers memory-oriented architecture, flexible processing systems, and APIs to fuel machine learning applications. MongoDB is the world’s most popular NoSQL database with an architecture built for modern, applications that manage high volumes of rapidly changing, unstructured data.

Together these technologies help enterprises develop sophisticated analytics platforms that serve many different use cases like Internet of Things, mobile apps, social engagement, customer data, and content management systems. Real-time analytics is made possible because Apache Spark jobs can be directly performed against data in MongoDB. Removing an extra layer in the process results in faster time to insights, simpler deployment architecture, and more efficient data analysis using MongoDB’s indexes.

Here’s an example. An application uses Spark and MongoDB to allow financial analysts to query real-time, intraday market data. Individual tick data gets stored and indexed in MongoDB, which allows fine-grained access to individual ticks or ranges of ticks per ticker symbol. Apache Spark is able to easily access this data, either as individual records or dynamically aggregated data, from MongoDB to perform more sophisticated, machine learning and analysis on the tick data.

There are many more examples of Apache Spark extending the analytics capability of MongoDB. To learn more about these examples and how to build a real-time analytics engine for your enterprise, download the white paper today.