In vetting MongoDB as a Time Series solution I wanted to try a simple data aggregation on financial data. My documents have the following schema:
{
"time": <BSON Time Field>,
"metadata": {
"exchange": "<Exchange Name>",
"symbol": "<Market Symbol>",
"timestamp": "<Microsecond Timestamp>",
"tradeId": "<String TradeID>"
},
"side": "<buy or sell>",
"price": 12345,
"amount": 1
}
I created my collection in the following manner:
db.createCollection("trades", {
timeseries: {
timeField: "time",
metaField: "metadata"
}
});
I tried to follow a similar approach taken in the article “Currency Analysis with Time Series Collections” adapted for the things I had seen while investigating the performance of the seemingly trivial OHLC aggregation that was done.
My aggregation:
db.trades.aggregate([
{
$match: {
"metadata.exchange": "ftx",
"metadata.symbol": "BTC/USD:USD",
},
},
{
$group: {
_id: {
exchange: "$metadata.exchange",
symbol: "$metadata.symbol",
time: {
$dateTrunc: {
date: "$time",
unit: "minute",
binSize: 5,
},
},
},
high: { $max: "$price" },
low: { $min: "$price" },
open: { $first: "$price" },
close: { $last: "$price" },
},
},
{
$sort: {
"_id.time": 1,
},
},
]);
However when I do this and investigate the explain
it seems that no matter what indexes I create I cannot prevent a FETCH
operation and as such this aggregation is terribly slow. Looking at the winning plan from some of my trials I get:
{
stage: 'FETCH',
filter: { 'meta.symbol': { '$eq': 'BTC/USD:USD' } },
inputStage: {
stage: 'IXSCAN',
keyPattern: {
'meta.exchange': 1,
'control.min.price': 1,
'control.max.price': 1
},
indexName: 'metadata.exchange_1_price_1',
isMultiKey: false,
multiKeyPaths: {
'meta.exchange': [],
'control.min.price': [],
'control.max.price': []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
'meta.exchange': [ '["ftx", "ftx"]' ],
'control.min.price': [ '[MinKey, MaxKey]' ],
'control.max.price': [ '[MinKey, MaxKey]' ]
}
}
}
(NOTE: I’ve created indexes on almost all permutations of the “used” portions of my data model - i.e. metadata.exchange
, metadata.symbol
, time
, price
and this seems to be the best I can get)
Current Indexes
```js [ { v: 2, key: { price: 1, time: 1, 'metadata.exchange': 1 }, name: 'price_1_time_1_metadata.exchange_1' }, { v: 2, key: { 'metadata.exchange': 1, price: 1 }, name: 'metadata.exchange_1_price_1' }, { v: 2, key: { 'metadata.exchange': 1, 'metadata.symbol': 1, price: 1 }, name: 'metadata.exchange_1_metadata.symbol_1_price_1' }, { v: 2, key: { 'metadata.symbol': 1, price: 1 }, name: 'metadata.symbol_1_price_1' }, { v: 2, key: { 'metadata.exchange': 1, time: 1 }, name: 'metadata.exchange_1_time_1' }, { v: 2, key: { 'metadata.exchange': 1, time: 1, price: 1 }, name: 'metadata.exchange_1_time_1_price_1' }, { v: 2, key: { 'metadata.exchange': 1, price: 1, time: 1 }, name: 'metadata.exchange_1_price_1_time_1' }, { v: 2, key: { 'metadata.symbol': 1, 'metadata.exchange': 1, price: 1 }, name: 'metadata.symbol_1_metadata.exchange_1_price_1' }, { v: 2, key: { 'metadata.exchange': 1, 'metadata.symbol': 1 }, name: 'metadata.exchange_1_metadata.symbol_1' }, { v: 2, key: { 'metadata.exchange': 1 }, name: 'metadata.exchange_1' }, { v: 2, key: { 'metadata.exchange': 1, time: -1 }, name: 'metadata.exchange_1_time_-1' }, { v: 2, key: { 'metadata.exchange': 1, price: 1, time: -1 }, name: 'metadata.exchange_1_price_1_time_-1' } ] ```I’m currently doing this with only 486k documents in my collection but have seen that there will be approx 100 mil documents across just a single symbol that I’m looking to track. This is being done in kubernetes with a sharded mongodb (7 shards with 2 replicas per shard, 2 mongos and 1 cfg server with 1 backup) with no resource limits being set on the shard servers. Is there any way I can get these queries to start coming back in a reasonable amount of time (sub second for <1mil documents)?