I am modelling a time series collection from an existing bucket style approach (~100GB worth of telemetry per year).
Basic scalar telemetry enters from edge devices and is then saved with the following information:
- Timestamp
- Name (Metric name, Parameter name)
- Value (int, float, boolean, string)
- Device (the sensor or edge device)
- Entity (the business object that the sensor belongs to)
db.createCollection("telemetry", {
timeseries: { timeField: "ts", metaField: "meta", granularity: "minutes" }
})
Should we add the name of the telemetry into the meta object?
Eg:
db.getCollection(“telemetry”).insert(
Option 1:
{
ts: ISODate("2022-01-01T02:28:43"),
meta: {
entity: ObjectId("5f4c605b4cf5037c9067ab22"),
device: ObjectId("5f4c605b4cf5037c9067ab31"),
name: "temperature"
},
value: 25.0
});
or
Option 2:
db.getCollection("telemetry").insert(
{
ts: ISODate("2022-01-01T02:28:43"),
meta: {
entity: ObjectId("5f4c605b4cf5037c9067ab22"),
device: ObjectId("5f4c605b4cf5037c9067ab31")
},
temperature: 25.0
});
We current use the Option 1 approach in the hourly buckets for data (each bucket is a document).
Thanks
Hi @Jeremy_Carter ,
The metadata classifier is mainly used to distinguish the source or clasification of the time based metrics/data.
So the way to form really depends on the application use case and requirements of queries…
In your application do you store other attribute rather than temperature (eg. Wind , humidity etc…)?
If so do you plot or present the values separately or in a group? For example all the values (wind , temp, humidity )for a specific hour or day? Or on the other hand you have one graph or interest in a specific aspect?
I am asking this since the first method makes more sense if you query on a specific metric type since its already classified under the meta fields. If you need all data for a specific point I would go with approach 2.
Thanks
Pavel
Firstly,
Thank you very much for your feedback. We have 40 types of sensors producing 10-50 values. We also support fully custom types which are user created. Because its not always “temperature” or “humidity” but rather infinite customisable metrics like “relay-1-status”, “relay-2-status” perhaps it is better to have the metric in the meta data.
The use case is industrial IoT if that helps with your feedback.
Is there a secondly ?
How is the data queried and presented?
Data is queried by either
- The entity
- The device
- Combination of entity or device
- The name of the telemetry value (eg temperature)
and obviously a time window.
Hi @Jeremy_Carter ,
According to the description option 1 seems as the better one.
All predicates are in meta which when indexed will be optimal.
I would recommend having an index of each of the 3 fields including time in each and the combination of the 2 that used in the compound query .
Thanks
Pavel