Time series data, which reflects measurements taken at regular time intervals, plays a critical role in a wide variety of use cases for a diverse range of industries. For example, park management agencies can use time series data to examine attendance at public parks to better understand peak times and schedule services accordingly. Retail companies, such as Walmart, depend on it to analyze consumer spending patterns down to the minute, to better predict demand and improve shift scheduling, hiring, warehousing, and other logistics.
As more sensors and devices are added to networks, time series data and its associated tools have become more important. In this article, we’ll look at three reasons (and two ways) to use MongoDB time series collections in your stack.
This in-depth introduction to time series data features MongoDB Product Manager Michael Gargiulo.
Reason 1: Purpose-built for the challenges of time series data
At first glance, time series collections resemble other collections within MongoDB, with similar functionalities and usage. Beneath the surface, however, they are specifically designed for storing, sorting, and working with time series data.
For developers, query speed and data accessibility continue to be challenges associated with time series data. Because of how quickly time series data can accumulate, it must be organized and sorted in a logical way to ensure that queries and their associated operations can run smoothly and quickly.
To address this issue, time series collections implement a key tenet of the MongoDB developer data platform: Data that is stored together is accessed together. Documents (the basic building block of MongoDB data) are grouped into buckets, which are organized by time. Each bucket contains time series data from a variety of sources — all of which were gathered from the same time period and all of which are likely to show up on the same queries.
For example, if you are using time series collections to analyze the rise in summer temperatures of Valencia, Spain from 1980 to 2020, then one bucket will contain temperatures for August 1991. Relevant, but distinct buckets (such as temperatures for the months of June and July 1991) would also be stored on the same page for faster, easier access.
MongoDB also lets you create compound indexes on any measurement field in the bucket (whether it’s timeField or metaField) for faster, more flexible queries. Because of the wide variety of indexing options, operations on time series data can be executed much more quickly than with competing products. For example, scan times are reduced by indexing buckets of documents (each of which has a unique identifier) rather than individual documents.
In terms of the previous example, you could create an index on the minimum and maximum average summer temperatures in Valencia, Spain from 1980 to 2020 to more quickly surface necessary data. That way, MongoDB does not have to scan the entire dataset to find min and max values over a period of nearly four decades.
Another concern for developers is finding the last metadata value, which in other solutions, requires users to scan the entire data set — a time-consuming process. Instead, time series collections use last point queries, where MongoDB simply retrieves the last measurement for each metadata value. As with other fields, users can also create indexes for last points in their data. In our example, you could create an index to identify the end of summer temperatures in Valencia from 1980 to 2020. By indexing the last values, time series collections can drastically reduce query times.
Another recurring challenge for time series applications is data loss from Internet of Things (IoT) applications for industries such as manufacturing, meteorology, and more. As sensors go offline and gaps in your data appear, it becomes much more difficult to run analytics, which require a continuous, uninterrupted flow of data.
As a solution, the MongoDB team created densification and gap filling. Densification, executed by the $densify command, creates blank, placeholder documents to fill in any missing timestamps. Users can then sort data by time and run the $fill command for gap filling. This process will estimate and add in any null or missing values in documents based on existing data. By using these two capabilities in tandem, you will get a steady flow of data to input into aggregation pipelines for insights.
Reason 2: Keep everything in house, in one data platform
Juggling different data tools and platforms can be exhausting. Cramming a bunch of separate products and technologies into a single infrastructure can create complex architectures and require significant operational overhead. Additionally, a third-party time series solution may not be compatible with your existing workflows and may necessitate more workarounds just to keep things running smoothly.
The MongoDB developer data platform brings together several products and features into a single, intuitive ecosystem, so developers can use MongoDB to address many common needs — from time series data to change streams — while reducing time-consuming maintenance and overhead.
As a result, users can take advantage of the full range of MongoDB features to collect, analyze, and transform time series data. You can query time series collections through the MongoDB Compass GUI or the MongoDB Shell, utilize familiar MongoDB capabilities like nesting data within documents, secondary indexes, and operators like $lookup or $merge, and process time series data through aggregation pipelines to extract insights and inform decision making.
Reason 3: Logical ways to organize and access time series data
Time series collections are designed to be efficient, effective, and easy to use. For example, these collections utilize a columnar storage format that is optimized for time series data. This approach ensures efficiency in all database operations, including queries, input/output, WiredTiger cache usage, and storage footprints for both data and secondary indexes.
Let’s look, for example, at how querying time series data collections works. When a query is executed, two things happen behind the scenes: Bucket unpacking and query rewrites. To begin with, time series collections will automatically unpack buckets — similar to the $unwind command. MongoDB will unscroll compressed data, sort it, and return it to the format in which it was inserted, so that it is easier for users to read and parse.
Query rewrites work alongside bucket unpacking to ensure efficiency. To avoid unpacking too many documents (which exacts a toll in time and resource usage), query rewrites use indexes on fields such as timestamps to automatically eliminate buckets that fall outside the desired range. For example, if you are searching for average winter temperatures in Valencia, Spain from 1980 to 2020, you can exclude all temperatures from the spring, summer, and fall months.
Now that we’ve examined several reasons to consider MongoDB time series collections, we’ll look at two specific use cases.
Use case 1: Algorithmic trading
Algorithmic trading is a major use case for time series data, and this market is predicted to grow to $15 billion by 2028. The strength of algorithms lies in their speed and automation; they reduce the possibility of mistakes stemming from human emotions or reaction time and allow for trading frequency beyond what a human can manage.
Trading algorithms also generate vast volumes of time series data, which cannot necessarily be deleted, due to compliance and forecasting needs. MongoDB, however, lets you set archival parameters, to automatically move that data into cheaper cloud object storage after a preset interval of time. This approach preserves your valuable storage space for more recent data.
Using MongoDB products such as Atlas, materialized views, time series collections, and triggers, it is also possible to build a basic trading algorithm. Basically, time series data will be fed into this algorithm, and when the conditions are ideal, the algorithm can buy or sell as needed, thus executing a series of individual trades with cumulative profits and losses (P&L). Although you’ll need a Java app to actually execute the trades, MongoDB can provide a strong foundation on which to build.
The structure of such an algorithm is simple. Time series data is loaded from a live feed into MongoDB Atlas, which will then input it into a materialized view to calculate the averages that will serve as the basis of your trades. You can also add a scheduled trigger to execute when new data arrives, thereby refreshing your materialized views, keeping your algorithm up to date, and not losing out on any buying/selling opportunities.
To learn more, watch Wojciech Witoszynski’s MongoDB World 2022 presentation on building a simple trading algorithm using MongoDB Atlas, “Algorithmic Trading Made Easy.”
Use case 2: IoT
Due to the nature of IoT data, such as frequent sensor readings at fixed times throughout a day, IoT applications are ideally suited for time series collections. For example, Confluent, a leading streaming data provider, uses its platform alongside MongoDB Atlas Device Sync, mobile development services, time series collections, and triggers to gather, organize, and analyze IoT data from edge devices.
IoT apps often feature high volumes of data taken over time from a wide range of physical sensors, which makes it easy to fill in meta fields and take advantage of densification and gap filling features as described above.
MongoDB’s developer data platform also addresses many of the challenges associated with IoT use cases. To begin with, MongoDB is highly scalable, which is an important advantage, given the huge volumes of data generated by IoT devices. Furthermore, MongoDB includes key features to enable you to make the most of your IoT data in real time. These include change streams for identifying database events as they occur, or functions, which can be pre-scheduled or configured to execute instantaneously to respond to database changes and other events.
For users dealing with time-based data, real-time or otherwise, MongoDB’s time series collections offer a seamless, highly optimized way to accelerate operations, remove friction, and use tools, such as triggers, to further analyze and extract value from their data. Additionally, users no longer have to manually bucket, query, or otherwise troubleshoot time series data; instead, MongoDB does all that work for them.
Try MongoDB time series collections for free in MongoDB Atlas.