Currency Analysis with Time Series Collections #1 — Generating Candlestick Charts Data
Rate this tutorial
Technical analysis is a methodology used in finance to provide price forecasts for financial assets based on historical market data.
When it comes to analyzing market data, you need a better toolset. You will have a good amount of data, hence storing, accessing, and fast processing of this data becomes harder.
The financial assets price data is an example of time-series data. MongoDB 5.0 comes with a few important features to facilitate time-series data processing:
- Time Series Collections: This specialized MongoDB collection makes it incredibly simple to store and process time-series data with automatic bucketing capabilities.
- Window Functions: Performs operations on a specified span of documents in a collection, known as a window, and returns the results based on the chosen window operator.
This three-part series will explain how you can build a currency analysis platform where you can apply well-known financial analysis techniques such as SMA, EMA, MACD, and RSI. While you can read through this article series and grasp the main concepts, you can also get your hands dirty and run the entire demo-toolkit by yourself. All the code is available in the Github repository.
We want to save the last price of every currency in MongoDB, in close to real time. Depending on the currency data provider, it can be millisecond level to minute level. We insert the data as we get it from the provider with the following simple data model:
We only have three fields in MongoDB:
time
is the time information when the symbol information is received.symbol
is the currency symbol such as "BTC-USD." There can be hundreds of different symbols.price
field is the numeric value which indicates the value of currency at the time.
Coinbase, one of the biggest cryptocurrency exchange platforms, provides a WebSocket API to consume real-time cryptocurrency price updates. We will connect to Coinbase through a WebSocket, retrieve the data in real-time, and insert it into MongoDB. In order to increase the efficiency of insert operations, we can apply bulk insert.
Even though our data source in this post is a cryptocurrency exchange, this article and the demo toolkit are applicable to any exchange platform that has time, symbol, and price information.
The MongoDB document model provides a lot of flexibility in how you model data. That flexibility is incredibly powerful, but that power needs to be harnessed in terms of your application’s data access patterns; schema design in MongoDB has a tremendous impact on the performance of your application.
The bucketing design pattern is one MongoDB design pattern that groups raw data from multiple documents into one document rather than keeping separate documents for each and every raw piece of data. Therefore, we see performance benefits in terms of index size savings and read/write speed. Additionally, by grouping the data together with bucketing, we make it easier to organize specific groups of data, thus increasing the ability to discover historical trends or provide future forecasting.
However, prior to MongoDB 5.0, in order to take advantage of bucketing, it required application code to be aware of bucketing and engineers to make conscious upfront schema decisions, which added overhead to developing efficient time series solutions within MongoDB.
Time Series collections are a new collection type introduced in MongoDB 5.0. It automatically optimizes for the storage of time series data and makes it easier, faster, and less expensive to work with time series data in MongoDB. There is a great blog post that covers MongoDB’s newly introduced Time Series collections in more detail that you may want to read first or for additional information.
For our use case, we will create a Time Series collection as follows:
While defining the time series collection, we set the
timeField
of the time series collection as time
, and the metaField
of the time series collection as symbol
. Therefore, a particular symbol’s data for a period will be stored together in the time series collection.The application code will make a simple insert operation as it does in a regular collection:
We read the data in the same way we would from any other MongoDB collection:
However, the underlying storage optimization specific to time series data will be done by MongoDB. For example, "BTC-USD" is a digital currency and every second you make an insert operation, it looks and feels like it’s stored as a separate document when you query it. However, the underlying optimization mechanism keeps the same symbols’ data together for faster and efficient processing. This allows us to automatically provide the advantages of the bucket pattern in terms of index size savings and read/write performance without sacrificing the way you work with your data.
We have already inserted hours of data for different currencies. A particular currency’s data is stored together, thanks to the Time Series collection. Now it’s time to start analyzing the currency data.
Now, instead of individually analyzing second level data, we will group the data by five-minute intervals, and then display the data on candlestick charts. Candlestick charts in technical analysis represent the movement in prices over a period of time.
As an example, consider the following candlestick. It represents one time interval, e.g. five minutes between
20210101-17:30:00
and 20210101-17:35:00
, and it’s labeled with the start date, 20210101-17:30:00.
It has four metrics: high, low, open, and close. High is the highest price, low is the lowest price, open is the first price, and close is the last price of the currency in this duration.In our currency dataset, we have to reach a stage where we need to have grouped the data by five-minute intervals like:
2021-01-01T01:00:00
, 2021-01-01T01:05:00
, etc. And every interval group needs to have four metrics: high, low, open, and close price. Examples of interval data are as follows:However, we only currently have second-level data for each ticker stored in our Time Series collection as we push the data for every second. We need to group the data, but how can we do this?
In addition to Time Series collections, MongoDB 5.0 has introduced a new aggregation operator,
$dateTrunc
. This powerful new aggregation operator can do many things, but essentially, its core functionality is to truncate the date information to the closest time or a specific datepart, by considering the given parameters. In our scenario, we want to group currency data for five-minute intervals. Therefore, we can set the $dateTrunc
operator parameters accordingly:In order to set the high, low, open, and close prices for each group (each candlestick), we can use other MongoDB operators, which were already available before MongoDB 5.0:
After grouping the data, we need to sort the data by time to analyze it properly. Therefore, recent data (represented by a candlestick) will be at the right-most of the chart.
Putting this together, our entire aggregation query will look like this:
After we grouped the data based on five-minute intervals, we can visualize it in a candlestick chart as follows:
We are currently using an open source visualization tool to display five-minute grouped data of BTC-USD currency. Every stick in the chart represents a five-minute interval and has four metrics: high, low, open, and close price.
With the introduction of Time Series collections and advanced aggregation operators for date calculations, MongoDB 5.0 makes currency analysing much easier.
After you’ve grouped the data for the selected intervals, you can allow MongoDB to remove old data by setting the
expireAfterSeconds
parameter in the collection options. It will automatically remove the older data than the specified time in seconds.Another option is to archive raw data to cold storage for further analysis. Fortunately, MongoDB Atlas has automatic archiving capability to offload the old data in a MongoDB Atlas cluster to cold object storage, such as cloud object storage - Amazon S3 or Microsoft Azure Blob Storage. To do that, you can set your archiving rules on the time series collection and it will automatically offload the old data to the cold storage. Online Archive will be available for time-series collections very soon.
Is the currency data already placed in Kafka topics? That’s perfectly fine. You can easily transfer the data in Kafka topics to MongoDB through MongoDB Sink Connector for Kafka. Please check out this article for further details on the integration of Kafka topics and the MongoDB Time Series collection.
In the following posts, we’ll discuss how well-known financial technical indicators can be calculated via windowing functions on time series collections.