Blog
{Blog}  See what’s new with MongoDB 6.0 — and why you’ll want to upgrade today >>

Time Series Data Introduction

Time series data is a collection of data points that are registered at regular intervals. Each data point is a key-value pair. The key is a point in time, and the value is the observation at that time. In practice, the value can be a collection of observations instead of a single one. For example, a sensor can be reporting both the temperature and the wind pressure at the current time.

But why is time series data useful? Time series analysis is a technique for extracting meaningful characteristics from the data. Time series forecasting uses different statistical models to predict future values. This allows organizations to understand trends in data over time, thus helping them to make better decisions.

Visualizing Time Series Data

The most straightforward way to analyze time series data is by plotting it and observing the characteristics of the data. Run charts are line charts that show the data points as a function of time. The x-axis is the time, and the y-axis is the observed value. For our examples, we'll be storing the data in a time series collection on a MongoDB Atlas instance. We’re plotting it with MongoDB Charts.

Time Series Data Examples

Time series data can be found in a variety of industries—from financial markets to sports to climate change. Let's see some examples of time series data.

Financial Time Series Data

Time series data is essential to financial markets. The stock market is a system with complex behavior and forecasting data from it can be a hard task. Various models and algorithms can be used to forecast such systems—linear, non-linear, neural networks, etc.

Stock prices change over time. The price can be recorded over time. The recorded time series data can be analyzed to find trends and predict prices. The following chart visualizes a time series data of the closing price of a stock.

The x-axis is the week of the year, and the y-axis is the average closing price of the stock for that week. The data source is a collection of documents, each document representing the stock price on a given day.

Financial data is analysed and forecasted with statistics called indicators. Common technical analysis indicators include the relative strength index (RSI) and the moving average convergence-divergence indicator (MACD). You can easily calculate and visualize RSI, MACD and other indicators using window functions. Window functions perform operations on a span of documents in the collection, (a “window”).

For example, you can get the first three observations in a time period and calculate their average. Then, you calculate the average of the second, third and fourth observation. After repeating the process for all observations in the period, you’re going to have a list of averages also known as a moving average. Moving average is another financial indicator used for “smoothing out” price data.

To learn more about calculating indicators with the help of window functions, check out the blog post series on Currency Analysis with Time Series Collections.

Sports Time Series Data

Sports are another great source for time series data. For example, the number of people attending a sports game can be recorded over time. The recorded time series data can be analyzed to find trends and correlate with other data.

But there's one sport where the scientific and objective analysis of the game is so important that it has its own name. Sabermetrics is the analytical study of the game of baseball. In the critically-acclaimed Hollywood film Moneyball, the Oakland Athletics baseball team uses sophisticated sabermetrics models for scouting and analyzing baseball players. The film showcases the importance of collecting time-series data and analysing it for modern sports teams.

Sabermetrics and scouting sport talent are broad topics which are quite beyond the scope of this article. Instead, let's take a look at a much simpler example. Coincidentally, we'll be looking at data coming from Oakland Athletics' rival in the Bay Bridge Series - San Francisco Giants. The following chart visualizes a time series data of attendees per game for one of the Major League teams: the San Francisco Giants. We can observe that the number of attendees is increasing over time.

Data series data is widely collected and analyzed in the sports world—for marketing, sports betting, team performance, and more.

Climate Change Time Series Data

Climate change is a big topic in the world. The Earth's climate is changing over time and has been tremendously accelerating which requires us to analyze the impact. The risks can be assessed and predicted based on recorded weather data. The recorded data can be analyzed to find trends, and correlate with other data.

The following line chart visualizes the CO2 emission rate of the Earth's atmosphere over the past 60 years.

Univariate vs. Multivariate Time Series Data

Univariate time series data follows a single variable over time. For example, the closing price of a stock is a univariate time series data where the price is the variable.

Univariate and Multivariate Time Series

Multivariate time series data is a collection of observed variables that are related to each other. For example, time series collection of the current weather may be a multivariate time series data where the variables are temperature, humidity, wind speed, etc. The variables are not only related to their previous values, but also to one another. This dependency should be considered when forecasting future values.

Time Series Data Key Characteristics

When visualized as a line chart, time series may reveal different characteristics. In the following sections, we'll explore three of the most common characteristics of time series data:

  • Trends.
  • Seasonality.
  • Outliers.

Trends are the tendency of the data values to increase or decrease over time. Trends can be upward or downward.

Upward and downward trends in time series

Trends can also be local or global. Local trends are the tendency of the data values to increase or decrease in a specific period. Global trends are the tendency of the data values to increase or decrease over the entire time frame of the dataset.

For example, the stock market shows downward trends during times of recession. However, the historical data shows that the stock market is going up, proving a global upward trend.

Seasonality

Seasonality is the tendency of the data to repeat a pattern over time.

Seasonality in time series

The seasonal cycle is connected to the interval of the data. The observed pattern may be repeated every week, every month, every quarter, etc. For example, consumer consumption is a yearly seasonal cycle—every year, personal spendings increase leading up to the winter holidays.

Outliers

Outliers in time series data are data fluctuations that cannot be explained by trend or seasonality. These fluctuations are inconsistent with the other data points. They can be caused by errors or by events that are not predictable.

There are different types of outliers. For example, an outlier can manifest once—with a few data points in a short period of time being tremendously different from the surrounding data—or it can be a repeating pattern of fluctuations. The former type is known as an additive outlier and the latter, a seasonal additive outlier.

Additive and seasonal additive outliers

Outliers can influence the statistical analysis of time series data and thus, the statistical significance of the data. Outlier detection and removal is an important part of the statistical analysis of time series data.

Time Series Data Use Cases

We already know that time series data can be used for forecasting. But what about other use cases?

Monitoring

Time series data can be collected and used for real-time monitoring in various industries—for example in application monitoring where log data is collected to track performance, availability and use of resources.

Another example is the manufacturing industry. Machine stats can be monitored with multiple sensors that generate time series data. For example, the temperature of a machine can be recorded over time. In case of a temperature anomaly, the machine can be shut down to prevent damage. This growing sector is known as industrial IoT (IIoT) monitoring.

Classification

Forecasting is used for predicting future values. Classification, on the other hand, tries to find patterns in the data to determine the class of the time series at hand. To clarify the difference, let's see a few problems that can be solved with time series classification:

  • Smart watch monitoring that classifies a heart rate as normal or abnormal.
  • Detect movement in a video surveillance feed and classify it. For example, differentiate between a person walking and a person running.
  • Classify a stock price as rising, falling, or stagnant. Note that this is different from predicting future prices but the identified class can be used for creating a forecast.

How to Store and Query Time Series Data

When modeling data for storage, it's important to answer a few questions:

  • What is the structure of the data?
  • What is the read/write ratio of the data?
  • How is data interconnected? What’s the cardinality of every relationship?
  • How is the data going to be used?

Time series data has the following characteristics:

  • The measurements are small storage-wise and sequential—often ordered in time.
  • In many cases, large volumes of measurements are recorded in a short amount of time.
  • For data analysis and visualization, it's essential to ensure rapid retrieval based on a period.

There are several optimized storage solutions for time series data. MongoDB supports time series data with native time series collections, real-time analytics features, and automatic query optimization. To learn more about these features, check out the dedicated Time Series page.

While there are several storage options for time series data, the MongoDB data platform is an integrated and optimized solution. The natively supported time series collections provide users with minimal storage costs and efficient queries. MongoDB uses best-in-class columnar compression algorithms to reduce the storage footprint for time series collections allowing you to store more data for longer at less of a cost. The MongoDB Query API is a powerful tool for analyzing your data. You can run analytical queries with window functions or use temporal operators to retrieve the most recent or the oldest data in your collection. To learn more about these features, check out the dedicated Time Series page.

Another challenge that arises when working with time series data is archiving old data. In many cases, analyzing recent data to extract information is crucial. Then, that data can be archived to a lower-cost storage solution. Again, there are a number of specialized solutions for archiving time series data. For example, MongoDB Atlas Online Archive allows you to set up automated archival of aged data, while also providing the ability to query the archived data. This is usually hard to achieve with cold-storage solutions. MongoDB Atlas Online Archive keeps your data queryable while also minimizing the storage cost. The ability to converge current data with historical (recent or archived) data leads to a better, more complete analysis.

Conclusion

Time series data has a wide range of applications - from discovering trends and making forecasts to application monitoring and classification. As we saw in this article, it can be found everywhere. Collecting and analyzing data to make better-informed decisions is a key part of the business process. Time series analysis can be a differentiator for your business regardless of the industry.

But the nature of the time series data requires a specialized storage solution. The integrated MongoDB data platform is a great choice for any business that needs to store and analyze time series data. From archiving old data while keeping it queryable in MongoDB Atlas Online Archive to using state-of-the-art compression algorithms in time series collections, MongoDB has a wide range of features that can help you with your time series data needs. To learn more about them, check out the dedicated MongoDB Time Series Data article.