Atlas Stream Processing enables you to read, write, and transform streams of data using the same aggregation operations used in Atlas databases. Atlas Stream Processing allows you to:
Build aggregation pipelines to continuously operate on streaming data.
Perform continuous validation to check that messages are properly formed, detect message corruption, and detect late-arriving data.
Transform fields as documents flow through your pipelines and route those documents to distinct databases, Kafka topics, or other external sinks using fields or expressions in each document as keys.
Continuously publish results to Atlas collections or Apache Kafka clusters, ensuring up-to-date views and analysis of data.
Atlas Stream Processing components belong directly to Atlas projects and operate independent of Atlas clusters.
Streaming Data
A stream is a continuous flow of immutable data originating from one or more sources. Examples of data streams include temperature or pressure readings from sensors, records of financial transactions, or change data capture events.
Data streams originate from sources such as Apache Kafka Topics or MongoDB change streams. You can then write processed data to sinks including Apache Kafka Topics, Atlas collections, external functions, or cloud data stores.
Atlas Stream Processing provides native stream processing capabilities to operate on continuous data without the time and computational constraints of an at-rest database.
Structure of a Stream Processor
Stream processors take the form of a pipeline which can be conceptually divided into three phases.
Sources
Stream processors begin by ingesting documents from sources of
streaming data to which Atlas Stream Processing is connected. These can be
broker systems like Apache Kafka, or database change streams such as
those generated by Atlas read/write operations. These inputs
must be valid json
or ejson
documents. Once the
$source
stage ingests a document, you can apply
MongoDB aggregation to that document
to transform it as needed.
In addition to ingesting data from a streaming source, Atlas Stream Processing also supports enriching your documents with data from HTTPS requests and $lookup operations to join data from connected Atlas clusters.
Pipelines
A stream processor leverages Aggregation pipeline stages and Aggregation operators in addition to the standard MongoDB suite of aggregation operators and stages to transform ingested data and extract valuable insights. Atlas Stream Processing can write documents it can't process to a Dead Letter Queue.
You can enrich documents by restructuring them, adding or removing fields, looking up information from your collections, and more. Atlas Stream Processing also enables you to collect events using windows and execute arbitrary functions.
Windows
Windows are pipeline stages that aggregate streaming data within a set time period. This enables you to group the data, take averages, find minima and maxima, and perform various other operations that are otherwise inapplicable to streaming data. Each stream processor can only have one window stage.
Functions
Atlas Stream Processing supports calls to either custom JavaScript functions or AWS Lambda functions that run against each document that the stream processes passes to them.
Sinks
After processing the ingested data, the stream processor persits it be writing it to a sink. Atlas Stream Processing offers the $emit and $merge stages for writing to different sink types. These stages are mutually exclusive with each other, and each stream processor can have only one sink stage. Your pipeline can include logic to write processed documents to different Kafka topics or Atlas collections within the same sink connections.
Atlas Stream Processing Regions
Atlas Stream Processing supports creating stream processing instances on AWS, Azure, and Google Cloud. For a list of available regions, see the Stream Processing Instances sections of the:
Amazon Web Services feature reference.
Microsoft Azure feature reference.
Google Cloud Platform feature reference.
Stream processors can read from and write to clusters hosted on different cloud providers, or in different regions.
Billing
For information on billing, see the Atlas Stream Processing billing page.
Next Steps
To begin working hands-on with Atlas Stream Processing, see Get Started with Atlas Stream Processing.
For more detailed information on core Atlas Stream Processing concepts, see the following: