Modelling time series data

I am modelling a time series collection from an existing bucket style approach (~100GB worth of telemetry per year).

Basic scalar telemetry enters from edge devices and is then saved with the following information:

  • Timestamp
  • Name (Metric name, Parameter name)
  • Value (int, float, boolean, string)
  • Device (the sensor or edge device)
  • Entity (the business object that the sensor belongs to)
db.createCollection("telemetry", {
    timeseries: { timeField: "ts", metaField: "meta", granularity: "minutes" }
})

Should we add the name of the telemetry into the meta object?

Eg:

db.getCollection(“telemetry”).insert(

Option 1:

{
	ts: ISODate("2022-01-01T02:28:43"),
	meta: {
		entity: ObjectId("5f4c605b4cf5037c9067ab22"),
		device: ObjectId("5f4c605b4cf5037c9067ab31"),
		name: "temperature"
	},
	value: 25.0
});

or

Option 2:

db.getCollection("telemetry").insert(
{
	ts: ISODate("2022-01-01T02:28:43"),
	meta: {
		entity: ObjectId("5f4c605b4cf5037c9067ab22"),
		device: ObjectId("5f4c605b4cf5037c9067ab31")
	},
	temperature: 25.0
});

We current use the Option 1 approach in the hourly buckets for data (each bucket is a document).

Thanks

Hi @Jeremy_Carter ,

The metadata classifier is mainly used to distinguish the source or clasification of the time based metrics/data.

So the way to form really depends on the application use case and requirements of queries…

In your application do you store other attribute rather than temperature (eg. Wind , humidity etc…)?

If so do you plot or present the values separately or in a group? For example all the values (wind , temp, humidity )for a specific hour or day? Or on the other hand you have one graph or interest in a specific aspect?

I am asking this since the first method makes more sense if you query on a specific metric type since its already classified under the meta fields. If you need all data for a specific point I would go with approach 2.

Thanks
Pavel

Firstly,

Thank you very much for your feedback. We have 40 types of sensors producing 10-50 values. We also support fully custom types which are user created. Because its not always “temperature” or “humidity” but rather infinite customisable metrics like “relay-1-status”, “relay-2-status” perhaps it is better to have the metric in the meta data.

The use case is industrial IoT if that helps with your feedback.

Is there a secondly ?

How is the data queried and presented?

Data is queried by either

  • The entity
  • The device
  • Combination of entity or device
  • The name of the telemetry value (eg temperature)

and obviously a time window.

Hi @Jeremy_Carter ,

According to the description option 1 seems as the better one.

All predicates are in meta which when indexed will be optimal.

I would recommend having an index of each of the 3 fields including time in each and the combination of the 2 that used in the compound query .

Thanks
Pavel