Data is our present and our future. Time series data, specifically, can be incredibly revealing. This sort of data mining allows you to gather and compare one or more variables at regular intervals. This means you can perform exploratory analysis and create predictive modeling to take past data pattern recognition and use it to get a glimpse into the future. But what is time series data, exactly? How can it be visualized? What are some common examples, and how can you better store and query time series data? Let's dive in!
Time series data is a collection of data points that are registered at regular intervals. Each data point is a key-value pair. The key is a point in time, and the value is the observation at that time. In practice, the value can be a collection of observations instead of a single chosen data point. For example, a sensor can be reporting both the temperature and the wind pressure at the current time.
But why is time series data useful? Time series analysis is a technique for extracting meaningful characteristics from pooled data. Time series forecasting models use different statistical models to predict a future value. This allows organizations to understand trends in data over time, extract meaningful statistics, and better analyze data, thus helping them to make smarter decisions.
The most straightforward way to analyze time series data is by plotting it and observing the characteristics of the data collected. Run charts are line charts that show the data points as a function of time. The x-axis is the time, and the y-axis is the observed value.
For our examples of these forecasting methods, we'll be storing the data in a time series collection on a MongoDB Atlas instance. We're plotting it with MongoDB Charts to better understand the historical data and set us up for better trend analysis.
Time series data can be found in a variety of industries — from financial markets to sports to climate change. Let's see some examples of time series data and time series analysis as advanced techniques of forecasting methods.
Time series data is essential to financial markets and stock market analysis. The market is a system with complex behavior, and forecasting and collecting data from it can be a hard task. Various models and algorithms can be used to forecast such systems and conduct trend analysis — a linear model, non-linear, neural networks, etc.
Stock prices change over time. The price can be recorded over time. The recorded time series data can be analyzed to find trends and predict future activity. The following chart visualizes time series data of the closing price of a stock.
The x-axis is the week of the year, and the y-axis is the average closing price of the stock for that week. The time series data source is a collection of documents, each document representing the stock price on a given day.
Financial data and stock prices are analyzed and forecasted with statistics called indicators. Common technical time series analysis indicators include the relative strength index (RSI) and the moving average convergence-divergence indicator (MACD). We'll talk more about the moving average in a minute.
You can easily calculate and visualize RSI, MACD, and other indicators using window functions. Window functions perform operations on a span of documents in the collection (a “window”).
For example, you can get the first three observations in the same time period and calculate their average. Then, you calculate the average of the second, third, and fourth observations.
After repeating the process for all observations in the period, you're going to have a list of averages, also known as a moving average model. Moving average is another financial indicator used for “smoothing out” price data (also called the exponential smoothing technique).
To learn more about calculating indicators with the help of window functions, check out the blog post series on Currency Analysis with Time Series Collections.
Sports are another great source of time series data and time series analysis. For example, the number of people attending a sports game can be recorded over time. The recorded time series data can be analyzed to find trends and correlate with other data.
But there's one sport where the scientific and objective time series analysis of the game is so important that it has its own name. Sabermetrics is the analytical time series analysis of the game of baseball.
In the critically-acclaimed Hollywood film Moneyball, the Oakland Athletics baseball team uses sophisticated sabermetrics models for scouting and analyzing baseball players. The film showcases the importance of collecting data and time series analysis for modern sports teams.
Sabermetrics and scouting sports talent are broad topics that are quite beyond the scope of this article. Instead, let's take a look at a much simpler example. Coincidentally, we'll be looking at data points recorded from the Oakland Athletics' rival in the Bay Bridge Series — the San Francisco Giants. The following chart visualizes the time series analysis of attendees per game. We can observe that the number of attendees is increasing over time.
Time series data is widely collected and analyzed in the sports world — for marketing, sports betting, team performance, and more.
Climate change is a big topic in the world. The earth's climate is changing over time and has been tremendously accelerating, and time series analysis can help to examine the impact. The risks can be assessed and predicted based on recorded weather data. The recorded data can be analyzed to find trends, and correlate with other data from various time intervals.
The following line chart visualizes the CO2 emission rate of the Earth's atmosphere over the past 60 years.
Univariate time series data follows a single target variable over time. For example, the closing price of a stock is a univariate time series data where the price is the variable. In this case, time series data analysis looks at this one independent variable over time to identify trends or patterns.
Multivariate time series data is a collection of observed variables (cross sectional data) that are related to each other. For example, the time series collection of the current weather may be multivariate time series data where the pooled historical data variables are temperature, humidity, wind speed, etc.
The cross sectional data variables are not only related to their previous values, but also to one another. This dependency should be considered when forecasting future trends and values.
When visualized as a line chart, time series analysis may reveal different characteristics. In the following sections, we'll explore three of the most common characteristics of time series data:
Trends are the tendency of the data values to increase or decrease over time. Trends can be upward or downward.
Trends can also be local or global. Local trends are the tendency of the data values to increase or decrease in a specific period. Global trends are the tendency of the data values to increase or decrease over the entire time frame of the data set.
For example, the stock market shows downward trends during times of recession. However, the historical values show that the stock market is going up, proving a global upward trend. Historical trends use past performance to help you better predict future events.
Seasonality is the tendency of the raw data to repeat a pattern over time.
The seasonal cycle is connected to the time intervals of the data. The observed pattern may be repeated every week, every month, every quarter, etc. For example, consumer consumption is a yearly seasonal cycle — every year, personal spendings increase leading up to the winter holidays.
Outliers in a time series data set are data fluctuations that cannot be explained by trend or seasonality. These fluctuations are inconsistent with the other data points and associated patterns. They can be caused by errors or events that are not predictable.
When conducting a time series analysis, you might find different types of outliers. For example, an outlier can manifest once — with a few data points in a short and specified period of time being tremendously different from the surrounding data — or it can be a repeating pattern of fluctuations over a given period.
The former type is known as an additive outlier and the latter is a seasonal additive outlier.
Outliers can influence the statistical time series analysis of data and thus, the statistical significance of time series data. Outlier detection and removal is an important part of statistical time series analysis.
We already know that time series data can be used for forecasting. But what about other use cases?
Time series data can be collected and used for real-time monitoring in various industries — for example, in application monitoring where log data is collected to track performance, availability, and use of resources.
Another example of time series data can be seen in the manufacturing industry. Machine stats can be monitored with multiple sensors that generate time series data. For example, the temperature of a machine can be recorded over time. In case of a temperature anomaly, the machine can be shut down to prevent damage. This growing sector of using sensor data is known as industrial IoT (IIoT) monitoring.
Forecasting is used for predicting future values in time series data. Classification, on the other hand, tries to find patterns in the data to determine the class of the time series at hand. To clarify the difference, let's see a few problems that can be solved with time series classification:
Smart watch monitoring that classifies a heart rate as normal or abnormal.
Detect movement in a video surveillance feed and classify it. For example, differentiate between a person walking and a person running.
Classify a stock price as rising, falling, or stagnant. Note that this is different from predicting future prices, but the identified class can be used for creating a forecast for series analysis.
When modeling data for storage and querying time series data, it's important to answer a few questions:
Time series data has the following characteristics:
The measurements and data points are small storage-wise and sequential — often ordered in time.
In many cases, large volumes of measurements are recorded in a short amount of time.
For time series analysis and visualization, it's essential to ensure rapid retrieval of data points based on a period.
There are several optimized storage solutions for time series data. MongoDB supports time series data with native time series collections, real-time series analysis features, and automatic query optimization. To learn more about these features and how they improve your process of analyzing time series data, check out the dedicated Time Series page.
While there are several storage options for time series data, the MongoDB data platform is an integrated and optimized solution. The natively supported time series collections provide users with minimal storage costs and efficient queries. MongoDB uses best-in-class columnar compression algorithms to reduce the storage footprint for time series collections, allowing you to store more data for longer at a lower cost.
The MongoDB Query API is a powerful tool for performing time series analysis. You can run analytical queries with window functions or use temporal operators to retrieve the most recent or the oldest data in your collection. To learn more about time series analysis and these additional features, check out the dedicated Time Series page.
Another challenge that arises when working with time series data is archiving old data. In many cases, analyzing recent data to extract information is crucial. Then, that time series data can be archived in a lower-cost storage solution. Again, there are a number of specialized solutions for archiving time series data.
For example, a key feature of MongoDB Atlas Online Archive is that it allows you to set up automated archival of aged data, while also providing the ability to query the archived data. This is usually hard to achieve with cold storage solutions. MongoDB Atlas Online Archive keeps your time series data queryable while also minimizing the storage cost.
The ability to converge current data with historical (recent or archived past values) data leads to a better, more complete time series analysis of data points.
Time series data has a wide range of applications — from discovering trends and making forecasts to application monitoring and classification. As we saw in this article, time series data can be found everywhere. Collecting these data points and conducting time series analysis to make better-informed decisions is a key part of the business process.
Time series analysis can be a differentiator for your business regardless of the industry. Companies that aren't utilizing it will rapidly fall behind the competition.
By now, you hopefully understand the importance of time series data. Collecting and implementing it is another story entirely.
The nature of time series data requires a specialized storage solution. The integrated MongoDB data platform is a great choice for any business that needs to store and analyze time series data. From archiving old data while keeping it queryable in MongoDB Atlas Online Archive to using state-of-the-art compression algorithms in time series collections, MongoDB has a wide range of features that can help you with your time series analysis needs.
To learn more about elevating your time series analysis, check out the dedicated MongoDB Time Series Data article.