Questions related to efficient querying of avg in timeslot?

I’m probably going to go a bit wide here, but bear with me. Lets say I have a MongoDB 5 time series which is essentially:

  time, // The timeField
  metadata: {  // The metaField
    key // The "id" for this particular reading
  value // some value, mainly numeric or boolean or perhaps string

I have several questions about being somewhat efficient when the amount of documents are in the billions. Lets pretend I have a couple of keys, numeric_key1, numberic_key2numeric_keyN.

I’m going to assume the following indexes exist: time, metadata.key and time + metadata.key

And I can expect updates to come in every second for thousands of inputs, so grunularity is in seconds.

  1. As long as all queries/aggregations are matching a specific key or set of keys, does splitting into different databases or different collections matter at all? I have a hunch: no. Assuming you drop the key matching, I’m guessing: yes
  • If doing aggregations to get e.g. 1 minute averages quite often, does it ever make sense to store the finished aggregate values to another collection for simplified lookups? Are there any mechanism to achieve this?

  • Being quite new to MongoDB and the aggregation framework, is there any obvious way to improve upon the following aggregation query?

     { $match: { "metadata.key": { $in: [ "numeric_key1, numeric_key2, numeric_keyN" ] },
     "time": { $gt: ISODate("2021-11-01T01:00:00.000+00:00") } } },
     { $project: { "value": 1, "key": "$metadata.key", "y": { $year: "$time" }, "mon": { $month: "$time"  }, "dom": { $dayOfMonth: "$time" }, "h": { $hour: "$time" }, "min": { $minute: "$time" } } },
     { $group: { "_id": { year: "$y", month: "$mon", day: "$dom", hour: "$h", minute: "$min" }, "value": { $avg: "$value" }, "key": { $first: "$key" } } },
     { $sort: { "_id": 1 } }

Thanks for taking the time, and any help given!