Time series data is the collection of data that is queried and indexed based on time-period.
Relational and non-relational databases have timestamp data types to store time-related data. Time series databases are specifically designed for time series data management. In this article, we discuss the importance of a time series database and how it works.
If you think about it, all the data that we store has a timestamp attached to it. For example, log files, customer login times, sensor data from IoT devices, traffic data, weather data, and browser history all have timestamps attached.
CUSTOMER_TRANSACTIONS 2022-04-14 11:25:25 Login attempt 2022-04-14 11:25:26 Login success 2022-04-14 11:26:03 Browse category accessories 2022-04-14 11:27:04 Added 2 items in cart 2022-04-14 11:28:02 Browse category electronics
Time series data can be measured in seconds and minutes (like sensor-based devices), hourly (like phone usage), daily (petrol price), weekly (timesheets), monthly (electricity consumption), quarterly (performance reports), half-yearly (company growth), or annually (profits and revenue). Time series data can be at regular intervals or event-driven (irregular):
|Date||Diesel price (in US$)|
|Date||Diesel price (in US$)|
In event-driven time series data, a new row is inserted only if there is a change in the price of diesel (event). In regular time series data, the price is checked at regular intervals. When plotted, time series data will always have one time axis.
The above examples are of linear time series data, where each point can be viewed as the linear combination between past, present, and future data, and can be analyzed using regression, auto-correlation, and other methods.
Databases that provide special features to efficiently handle (store, manipulate, and retrieve) time series data are called time series databases. Some popular time series databases are Prometheus, InfluxDB, and TimeScaleDB. Databases like MongoDB provide time series collections to handle time series data, so you can get the benefits of both a time series and a non-relational database in one.
As shown in the above example, data in a time series database has a timestamp and at least one metric related to it. For example, the diesel price was $5.45 (metric) on 24-04-2022. We can add more metrics as well—for example, petrol price, stock prices, or the number of cars visiting the state museum.
|Date||Diesel price||Petrol price||Stock price||Number of cars visiting the state museum|
This way, we can store any amount of data that changes with time. Time series data is almost always appended in comparison to updates or deletion. That means databases can have huge workloads, and even indexes may not be enough for optimization. Also, more often than not, you would want statistics or aggregates collected over a time period—for example, average diesel prices from 24-04-2022 to 01-05-2022. Time series databases are optimized for performance as well as performing specialized functions.
Time series databases store data as time-value pairs for easy analysis and querying. Time series databases can efficiently handle concurrent series—i.e., multiple metrics parallel—making them well-suited for banking and financial transactions.
There are three aspects of a time series database: database features, time series features, and data features.
This includes the basic CRUD (Create, Read, Update, and Delete) features, as well as features like high availability, scalability, and reliability. The database should be able to handle large amounts of writes, and reads/updates should be at particular time windows.
The time is stored as a timestamp, which includes the time in precision of seconds and milliseconds. Date can be stored in various formats using the DateTime data type. Timestamp supports calendar and time zone adaptation. Time series databases also provide support for getting aggregations and statistics about the data based on time.
Data is appended in the sequence of time and is stored as time, value, and events. Data can have many dimensions. The data often does not require relationships between entries of different tables and older data is purged or compressed and archived.
A time series database (TSD) contains special tools and features to handle huge loads of time-series data. Some major benefits are:
TSD consists of tools and features to store data at a very high speed. It also provides compression algorithms to store older data that can be retrieved when needed.
TSD is indexed on time, making it easy to get data based on a certain period of time. This is particularly useful to analyze IoT data, financial data, weather forecasting, and many other real-life use cases.
As data is sent at regular intervals and writes are fast and consistent, data can be sent to a streaming engine to perform real-time analytics and visualization. TSD also allows for data mining as it can scale, and huge amounts of data can be stored as the requirement grows.
TSD contains many functions like aggregation, grouping, comparison, machine learning, and other similar functions to perform complex analysis on the data. These functions are optimized for performance and help in faster decision-making.
It’s easier to pull out reports and summaries for a period of time, as TSD is already optimized for getting precise reports calculated over a period of time, particularly if some metrics like percentile, max, min, and trends are needed.
Initially, TSDs were intended mainly for financial purposes. However, with digitization and the popularity of smart devices, the use cases of time series databases have gone up.
IoT (Internet of Things): Smart home and wearable devices, mobile phones, and inventory management systems keep track of every activity and keep sending data for generating alerts and patterns to track usage and set goals.
Sales forecasting: Based on data for a period of time, sales teams can generate reports and summaries and predict the performance and trends for the next quarter (year) and suggest improvements.
Financial trends: Making financial predictions—for example, stock market predictions—is quite easy with a time series database, as it stores a lot of contextual data that can be cross-referenced later, for analysis.
Data summary and reporting: Using time series features of a TSD, you can get a summary of data for different times in a more optimized way. You can get accurate reports, based on the smallest measurements of time (like milliseconds).
A time series database should satisfy the following requirements:
MongoDB’s data platform provides all of these features and is quite suitable for handling time series data. You can access MongoDB data from anywhere using MongoDB Atlas, MongoDB’s cloud-based application data platform.
Time series data always has one of the axes as time. The other metrics can be anything, like stock price, diesel price, number of users that visited a museum, and so on. Time series data is queried based on time to data for a period of time.
Time series data is time-stamped data arranged as a sequence of data points indexed in the order of time—for example, daily petrol price, average monthly wages of employees, hourly Facebook logins, etc. Mathematically, time series can be represented as variable x = f(t), where f(t) is the function of time.
The four components of time series are based on different aspects of movement of time series:
Trends: Trends show the increase and decrease over a period of time—for example, population, items in inventory, and number of schools opened.
Seasonal variations: These are regular periodic variations observed during one year—for example, the sale of geysers and air coolers, and the number of weddings in a particular period.
Cyclical fluctuations: These are time series variations for more than a year. Cyclic variations form a complete circle and return to the start state, with oscillations—for example, business cycles and weather cycles.
Irregular variations: These are unforeseen variations in a regular time series—for example, the impact of floods on crop production and the sudden collapse of a warehouse.