Exploring Window Operators in Atlas Stream Processing
Rate this tutorial
In our on windowing, we introduced window operators available in Atlas Stream Processing. Window operators are one of the most commonly used operations to effectively process streaming data. provides two window operators: and . In this tutorial, we will explore both of these operators using the sample solar data generator provided within Atlas Stream Processing.
Before we begin creating stream processors, make sure you have a who has “atlasAdmin” access to the Atlas Project. Also, if you do not already have a Stream Processing Instance created with a connection to the sample_stream_solar data generator, please follow the instructions in and then continue on.
First, confirm sample_stream_solar is added as a connection by issuing
Next, let’s define a $source stage to describe where Atlas Stream Processing will read the stream data from.
Then, issue a .process command to view the contents of the stream on the console.
You will see the stream of solar data printed on the console. A sample of this data is as follows:
Refer back to the schema from the sample stream solar data. To create a tumbling window, let’s create a variable and define our tumbling window stage.
We are calculating the maximum value and average over the span of one-minute, non-overlapping intervals. Let’s use the
.processcommand to run the streaming query in the foreground and view our results in the console.
Here is an example output of the statement:
The pipeline that is used within a window function can include blocking stages and non-blocking stages.
Non-blocking stages do not require multiple data points to be meaningful, and they include operators such as
$unwind, to name a few. You can use non-blocking before, after, or within the blocking stages. To illustrate this, let’s create a query that shows the average, maximum, and delta (the difference between the maximum and average). We will use a non-blocking $match to show only the results from device_1, calculate the tumblingWindow showing maximum and average, and then include another non-blocking
Now we can use the .process command to run the stream processor in the foreground and view our results in the console.
The results of this query will be similar to the following:
Notice the time segments and how they align on the minute.
Additionally, notice that the output includes the difference between the calculated values of maximum and average for each window.
A hopping window, sometimes referred to as a sliding window, is a fixed-size window that moves forward in time at overlapping intervals. In Atlas Stream Processing, you use the
$hoppingWindowoperator. In this example, let’s use the operator to see the average.
To help illustrate the start and end time segments, let's create a filter to only return device_1.
Now let’s issue the
.processcommand to view the results in the console.
An example result is as follows:
Notice the time segments.
The time segments are overlapping by 30 seconds as was defined by the hopSize option. Hopping windows are useful to capture short-term patterns in data.
By continuously processing data within time windows, you can generate real-time insights and metrics, which can be crucial for applications like monitoring, fraud detection, and operational analytics. Atlas Stream Processing provides both tumbling and hopping window operators. Together these operators enable you to perform various aggregation operations such as sum, average, min, and max over a specific window of data. In this tutorial, you learned how to use both of these operators with solar sample data.
Streaming Data from MongoDB to BigQuery Using Confluent Connectors
Jul 11, 2023 | 4 min read