I’m probably going to go a bit wide here, but bear with me. Lets say I have a MongoDB 5 time series which is essentially:
{
time, // The timeField
metadata: { // The metaField
key // The "id" for this particular reading
},
value // some value, mainly numeric or boolean or perhaps string
}
I have several questions about being somewhat efficient when the amount of documents are in the billions. Lets pretend I have a couple of keys, numeric_key1
, numberic_key2
… numeric_keyN
.
I’m going to assume the following indexes exist: time
, metadata.key
and time + metadata.key
And I can expect updates to come in every second for thousands of inputs, so grunularity
is in seconds
.
- As long as all queries/aggregations are matching a specific
key
or set ofkey
s, does splitting into different databases or different collections matter at all? I have a hunch: no. Assuming you drop thekey
matching, I’m guessing: yes
-
If doing aggregations to get e.g. 1 minute averages quite often, does it ever make sense to store the finished aggregate values to another collection for simplified lookups? Are there any mechanism to achieve this?
-
Being quite new to MongoDB and the aggregation framework, is there any obvious way to improve upon the following aggregation query?
db.mytimeseriescollection.aggregate( { $match: { "metadata.key": { $in: [ "numeric_key1, numeric_key2, numeric_keyN" ] }, "time": { $gt: ISODate("2021-11-01T01:00:00.000+00:00") } } }, { $project: { "value": 1, "key": "$metadata.key", "y": { $year: "$time" }, "mon": { $month: "$time" }, "dom": { $dayOfMonth: "$time" }, "h": { $hour: "$time" }, "min": { $minute: "$time" } } }, { $group: { "_id": { year: "$y", month: "$mon", day: "$dom", hour: "$h", minute: "$min" }, "value": { $avg: "$value" }, "key": { $first: "$key" } } }, { $sort: { "_id": 1 } } )
Thanks for taking the time, and any help given!