Time series data is a collection of data points that are registered at regular intervals. Each data point is a key-value pair. The key is a point in time, and the value is the observation at that time. In practice, the value can be a collection of observations instead of a single one. For example, a sensor can be reporting both the temperature and the wind pressure at the current time.
But why is time series data useful? Time series analysis is a technique for extracting meaningful characteristics from the data. Time series forecasting uses different statistical models to predict future values. This allows organizations to understand trends in data over time, thus helping them to make better decisions.
The most straightforward way to analyze time series data is by plotting it and observing the characteristics of the data. Run charts are line charts that show the data points as a function of time. The x-axis is the time, and the y-axis is the observed value. For our examples, we'll be storing the data in a time series collection on a MongoDB Atlas instance. We’re plotting it with MongoDB Charts.
Time series data can be found in a variety of industries—from financial markets to sports to climate change. Let's see some examples of time series data.
Time series data is essential to financial markets. The stock market is a system with complex behavior and forecasting data from it can be a hard task. Various models and algorithms can be used to forecast such systems—linear, non-linear, neural networks, etc.
Stock prices change over time. The price can be recorded over time. The recorded time series data can be analyzed to find trends and predict prices. The following chart visualizes a time series data of the closing price of a stock.
The x-axis is the week of the year, and the y-axis is the average closing price of the stock for that week. The data source is a collection of documents, each document representing the stock price on a given day.
Financial data is analysed and forecasted with statistics called indicators. Common technical analysis indicators include the relative strength index (RSI) and the moving average convergence-divergence indicator (MACD). You can easily calculate and visualize RSI, MACD and other indicators using window functions. Window functions perform operations on a span of documents in the collection, (a “window”).
For example, you can get the first three observations in a time period and calculate their average. Then, you calculate the average of the second, third and fourth observation. After repeating the process for all observations in the period, you’re going to have a list of averages also known as a moving average. Moving average is another financial indicator used for “smoothing out” price data.
To learn more about calculating indicators with the help of window functions, check out the blog post series on Currency Analysis with Time Series Collections.
Sports are another great source for time series data. For example, the number of people attending a sports game can be recorded over time. The recorded time series data can be analyzed to find trends and correlate with other data.
But there's one sport where the scientific and objective analysis of the game is so important that it has its own name. Sabermetrics is the analytical study of the game of baseball. In the critically-acclaimed Hollywood film Moneyball, the Oakland Athletics baseball team uses sophisticated sabermetrics models for scouting and analyzing baseball players. The film showcases the importance of collecting time-series data and analysing it for modern sports teams.
Sabermetrics and scouting sport talent are broad topics which are quite beyond the scope of this article. Instead, let's take a look at a much simpler example. Coincidentally, we'll be looking at data coming from Oakland Athletics' rival in the Bay Bridge Series - San Francisco Giants. The following chart visualizes a time series data of attendees per game for one of the Major League teams: the San Francisco Giants. We can observe that the number of attendees is increasing over time.
Data series data is widely collected and analyzed in the sports world—for marketing, sports betting, team performance, and more.
Climate change is a big topic in the world. The Earth's climate is changing over time and has been tremendously accelerating which requires us to analyze the impact. The risks can be assessed and predicted based on recorded weather data. The recorded data can be analyzed to find trends, and correlate with other data.
The following line chart visualizes the CO2 emission rate of the Earth's atmosphere over the past 60 years.
Univariate time series data follows a single variable over time. For example, the closing price of a stock is a univariate time series data where the price is the variable.
Multivariate time series data is a collection of observed variables that are related to each other. For example, time series collection of the current weather may be a multivariate time series data where the variables are temperature, humidity, wind speed, etc. The variables are not only related to their previous values, but also to one another. This dependency should be considered when forecasting future values.
When visualized as a line chart, time series may reveal different characteristics. In the following sections, we'll explore three of the most common characteristics of time series data:
Trends are the tendency of the data values to increase or decrease over time. Trends can be upward or downward.
Trends can also be local or global. Local trends are the tendency of the data values to increase or decrease in a specific period. Global trends are the tendency of the data values to increase or decrease over the entire time frame of the dataset.
For example, the stock market shows downward trends during times of recession. However, the historical data shows that the stock market is going up, proving a global upward trend.
Seasonality is the tendency of the data to repeat a pattern over time.
The seasonal cycle is connected to the interval of the data. The observed pattern may be repeated every week, every month, every quarter, etc. For example, consumer consumption is a yearly seasonal cycle—every year, personal spendings increase leading up to the winter holidays.
Outliers in time series data are data fluctuations that cannot be explained by trend or seasonality. These fluctuations are inconsistent with the other data points. They can be caused by errors or by events that are not predictable.
There are different types of outliers. For example, an outlier can manifest once—with a few data points in a short period of time being tremendously different from the surrounding data—or it can be a repeating pattern of fluctuations. The former type is known as an additive outlier and the latter, a seasonal additive outlier.
Outliers can influence the statistical analysis of time series data and thus, the statistical significance of the data. Outlier detection and removal is an important part of the statistical analysis of time series data.
We already know that time series data can be used for forecasting. But what about other use cases?
Time series data can be collected and used for real-time monitoring in various industries—for example in application monitoring where log data is collected to track performance, availability and use of resources.
Another example is the manufacturing industry. Machine stats can be monitored with multiple sensors that generate time series data. For example, the temperature of a machine can be recorded over time. In case of a temperature anomaly, the machine can be shut down to prevent damage. This growing sector is known as industrial IoT (IIoT) monitoring.
Forecasting is used for predicting future values. Classification, on the other hand, tries to find patterns in the data to determine the class of the time series at hand. To clarify the difference, let's see a few problems that can be solved with time series classification:
When modeling data for storage, it's important to answer a few questions:
Time series data has the following characteristics:
There are several optimized storage solutions for time series data. MongoDB supports time series data with native time series collections, real-time analytics features, and automatic query optimization. To learn more about these features, check out the dedicated Time Series page.
While there are several storage options for time series data, the MongoDB data platform is an integrated and optimized solution. The natively supported time series collections provide users with minimal storage costs and efficient queries. MongoDB uses best-in-class columnar compression algorithms to reduce the storage footprint for time series collections allowing you to store more data for longer at less of a cost. The MongoDB Query API is a powerful tool for analyzing your data. You can run analytical queries with window functions or use temporal operators to retrieve the most recent or the oldest data in your collection. To learn more about these features, check out the dedicated Time Series page.
Another challenge that arises when working with time series data is archiving old data. In many cases, analyzing recent data to extract information is crucial. Then, that data can be archived to a lower-cost storage solution. Again, there are a number of specialized solutions for archiving time series data. For example, MongoDB Atlas Online Archive allows you to set up automated archival of aged data, while also providing the ability to query the archived data. This is usually hard to achieve with cold-storage solutions. MongoDB Atlas Online Archive keeps your data queryable while also minimizing the storage cost. The ability to converge current data with historical (recent or archived) data leads to a better, more complete analysis.
Time series data has a wide range of applications - from discovering trends and making forecasts to application monitoring and classification. As we saw in this article, it can be found everywhere. Collecting and analyzing data to make better-informed decisions is a key part of the business process. Time series analysis can be a differentiator for your business regardless of the industry.
But the nature of the time series data requires a specialized storage solution. The integrated MongoDB data platform is a great choice for any business that needs to store and analyze time series data. From archiving old data while keeping it queryable in MongoDB Atlas Online Archive to using state-of-the-art compression algorithms in time series collections, MongoDB has a wide range of features that can help you with your time series data needs. To learn more about them, check out the dedicated MongoDB Time Series Data article.