High qwrite using timeseries (mongodb 7)

Vincent_van_Megen · December 3, 2024, 12:08pm

We’ve been using mongodb for our timeseries data since version 3. Storing hourly data in daily documents and minute data in hourly document (upsert data into both collections).

We are now testing timeseries with mongodb. The read performance is pretty good and it’s easy to aggregate for different data we need.

Write performance however does not seem to scale as well as with the non timeseries collection. We are storing 30 days of data, the timeseries collection is 3TB but the non-TS collection is only 1.43TB.

We are doing bulkwrites to the TS collection (1000 entries at a time). The documents look like this:

{
  "time": {
    "$date": "2024-11-01T21:00:00.000Z"
  },
  "meta": {
    "entity": {
      "$oid": "64a7580ba5d77200c1389439"
    },
    "ttl": 30
  },
  "metric": "rx",
  "_id": {
    "$oid": "6724b4f8042831401c096d8a"
  },
  "component": "eth0",
  "plugin": "network",
  "value": 503,
  "value_max": 3057
}```

We moved metric, component and plugin out of the meta field since keeping it in there would create way too many buckets and slow down queries.

What we are seeing on the timeseries database is that while CPU, memory, networking etc. is at okay levels qwrites seem to fill up from time to time which completely blocks the collection. The TS collection with the format seems to receive from 100k to 300k requests per second. The non-TS collection needs about 7k upsert per second and seems to be much more stable but slightly slower when reading.

Are we doing something wrong, how can we make sure that we are getting more stable inserting experience?

Vincent_van_Megen · December 3, 2024, 10:30pm

We moved metric, component and plugin out of the meta field since keeping it in there would create way too many buckets and slow down queries.

What we are seeing on the timeseries database is that while CPU, memory, networking etc. is at okay levels qwrites seem to fill up from time to time which completely blocks the collection. The TS collection with the format seems to receive from 100k to 300k requests per second. The non-TS collection needs about 7k upsert per second and seems to be much more stable but slightly slower when reading.

Are we doing something wrong, how can we make sure that we are getting more stable inserting experience?

We are now running a 3 member replicaset with 128gb ram, nvme drives and 32 cores per machine. CPU usage is between 10-20%.