Docs Menu

Docs HomeMongoDB Manual

Best Practices for Time Series Collections

On this page

  • Optimize Inserts
  • Batch Document Writes
  • Use Consistent Field Order in Documents
  • Increase the Number of Clients
  • Optimize Compression
  • Omit Fields Containing Empty Objects and Arrays from Documents
  • Round Numeric Data to Few Decimal Places
  • Optimize Query Performance
  • Set Appropriate Bucket Granularity
  • Create Secondary Indexes
  • Query metaFields on Sub-Fields

This page describes best practices to improve performance and data usage for time series collections.

To optimize insert performance for time series collections, perform the following actions.

When inserting multiple documents:

For example, if you have two sensors, sensor A and sensor B, a batch containing multiple measurements from a single sensor incurs the cost of one insert, rather than one insert per measurement.

The following operation inserts six documents, but only incurs the cost of two inserts (one per batch), because the documents are ordered by sensor. The ordered parameter is set to false to improve performance:

db.temperatures.insertMany( [
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 10
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 12
},
{
"metadata": {
"sensor": "sensorA"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 13
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-18T00:00:00.000Z"),
"temperature": 20
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-19T00:00:00.000Z"),
"temperature": 25
},
{
"metadata": {
"sensor": "sensorB"
},
"timestamp": ISODate("2021-05-20T00:00:00.000Z"),
"temperature": 26
}
], {
"ordered": false
})

Using a consistent field order in your documents improves insert performance.

For example, inserting these documents achieves optimal insert performance:

{
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"name": "sensor1",
"range": 1
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"name": "sensor1",
"range": 5
}

In contrast, these documents do not achieve optimal insert performance, because their field orders differ:

{
"range": 1,
"_id": ObjectId("6250a0ef02a1877734a9df57"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T00:00:00.441Z")
},
{
"_id": ObjectId("6560a0ef02a1877734a9df66"),
"name": "sensor1",
"timestamp": ISODate("2020-01-23T01:00:00.441Z"),
"range": 5
}

Increasing the number of clients writing data to your collections can improve performance.

Important

Disable Retryable Writes

To write data with multiple clients, you must disable retryable writes. Retryable writes for time series collections do not combine writes from multiple clients.

To learn more about retryable writes and how to disable them, see retryable writes.

To optimize data compression for time series collections, perform the following actions.

To optimize compression, if your data contains empty objects or arrays, omit the empty fields from your documents.

For example, consider the following documents:

{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z"),
"coordinates": []
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}

The alternation between coordinates fields with populated values and an empty array result in a schema change for the compressor. The schema change causes the second and third documents in the sequence remain uncompressed.

In contrast, the following documents where the empty array is omitted receive the benefit of optimal compression:

{
"timestamp": ISODate("2020-01-23T00:00:00.441Z"),
"coordinates": [1.0, 2.0]
},
{
"timestamp": ISODate("2020-01-23T00:00:10.441Z")
},
{
"timestamp": ISODate("2020-01-23T00:00:20.441Z"),
"coordinates": [3.0, 5.0]
}

Round numeric data to the precision required for your application. Rounding numeric data to fewer decimal places improves the compression ratio.

When you create a time series collection, MongoDB groups incoming time series data into buckets. By accurately setting granularity, you control how frequently data is bucketed based on the ingestion rate of your data.

Starting in MongoDB 6.3, you can use the custom bucketing parameters bucketMaxSpanSeconds and bucketRoundingSeconds to specify bucket boundaries and more precisely control how time series data is bucketed.

You can improve performance by setting the granularity or custom bucketing parameters to the best match for the time span between incoming measurements from the same data source. For example, if you are recording weather data from thousands of sensors but only record data from each sensor once per 5 minutes, you can either set granularity to "minutes" or set the custom bucketing parameters to 300 (seconds).

In this case, setting the granularity to hours groups up to a month's worth of data ingest events into a single bucket, resulting in longer traversal times and slower queries. Setting it to seconds leads to multiple buckets per polling interval, many of which might contain only a single document.

The following table shows the maximum time interval included in one bucket of data when using a given granularity value:

granularity
granularity bucket limit
seconds
1 hour
minutes
24 hours
hours
30 days

Tip

To improve query performance, create one or more secondary indexes on your timeField and metaField to support common query patterns. In versions 6.3 and higher, MongoDB creates a secondary index on the timeField and metaField automatically.

MongoDB reorders the metaFields of time-series collections, which may cause servers to store data in a different field order than applications. If metaFields are objects, queries on entire metaFields may produce inconsistent results because metaField order may vary between servers and applications. To optimize queries on time-series metaFields, query timeseries metaFields on scalar sub-fields rather than entire metaFields.

The following example creates a time series collection:

db.weather.insertMany( [
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T00:00:00.000Z" ),
"temp": 12
},
{
"metaField": { "sensorId": 5578, "type": "temperature" },
"timestamp": ISODate( "2021-05-18T04:00:00.000Z" ),
"temp": 11
}
] )

The following query on the sensorId and type scalar sub-fields returns the first document that matches the query criteria:

db.weather.findOne( {
"metaField.sensorId": 5578,
"metaField.type": "temperature"
} )

Example output:

{
_id: ObjectId("6572371964eb5ad43054d572"),
metaField: { sensorId: 5578, type: 'temperature' },
timestamp: ISODate( "2021-05-18T00:00:00.000Z" ),
temp: 12
}
←  Shard a Time Series CollectionTime Series Collection Reference →