Apache Spark has become one of the fastest growing Apache Software Foundation projects. With its memory-oriented architecture, flexible processing systems and easy-to-use APIs, Apache Spark has emerged as a leading framework for real-time analytics.
Apache Spark jobs can be executed directly against data managed by MongoDB without the time and complexity of first moving the data to Hadoop's HDFS. This approach presents many benefits:
- It reduces time to insight, allowing the business to gain advantages much more quickly.
- It simplifies deployment architectures, reducing complexity and ongoing cost of ownership.
- It allows for more efficient analysis of data by leveraging MongoDB's indexes.
For example, consider an application that allows analysts to query of real-time, intraday market data. Individual tick data can quickly be stored and indexed in MongoDB, allowing for fine-grained access to individual ticks or ranges of ticks per ticker symbol. Apache Spark can efficiently access this data, as individual records, or as dynamically aggregated data, for more sophisticated processing with machine learning algorithms and other types of analysis.
To learn more, download the white paper; it discusses the analytics capabilities offered by MongoDB and Apache Spark, and then discusses how to combine them into a real-time analytics engine. The paper concludes with example use cases.