I am confused about Granularity for Time Series Data official document example:
For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, set granularity
to "minutes"
. Setting the granularity
to hours
groups up to a month’s worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. Setting it to seconds
leads to multiple buckets per polling interval, many of which might contain only a single document.
But just above the example, it says setting seconds
, will group granularity bucket limit
up to 1 hour. So, shouldn’t setting seconds
be appropriate here?
Also, I found the statement contradictory in the second example of bucketMaxSpanSeconds
.
For the weather station example with 5 minute sensor intervals, you could fine tune bucketing by setting the custom bucketing parameters to 300 seconds, instead of using a granularity
of "minutes"
.
So, in the 1st example, we are avoiding seconds (whose granularity
bucket limitis 1 hour) and choosing
minutes( whose
granularity bucket limit
is 24 hours). But in the second example, we are setting it to as little as 5 minutes.