Build aggregation pipelines to continuously query, analyze, and react to streaming data without the delays inherent to batch processing.
Perform Continuous Schema Validation to check that events are properly formed before processing, detect message corruption, and detect late arriving data that has missed a processing window.
Continuously materialize views into Atlas database collections or streaming systems like Apache Kafka to maintain fresh analytical views of data supporting decision making and action.
Streaming data lives inside of event streaming platforms (like Apache Kafka), and these systems are essentially an immutable distributed log. Event data is published and consumed from event streaming platforms using APIs.
Developers need to use a stream processor to perform more advanced processing, such as stateful aggregations, window operations, mutations, and creating materialized views. These are similar to the operations one does when running queries on a database, except that stream processing continuously queries an endless stream of data. This area of streaming is more nascent; however, technologies such as Apache Flink and Spark Streaming are quickly gaining traction.
Stream processing is the area where Atlas Stream Processing focuses. MongoDB is providing developers with a better way to process streams for use in their applications, leveraging the aggregation framework.
Stream processing is the processing of data continuously. In the context of building event-driven applications, stream processing enables reactive and compelling experiences like real-time notifications, personalization, route planning, or predictive maintenance.
Batch processing does not work on continuously produced data. Instead, batch processing works by gathering data over a specified period of time and then processing that static data as needed. An example of batch processing is a retail business collecting sales at the close of business each day for reporting purposes and/or updating inventory levels.