My team is investigating using time-series collections in Mongo 6. From our experiments, inserting the same dataset into a time-series collection with the same insertion code, on the same hardware (3x replica set, via a mongos with a single shard, self-hosted), with a fresh collection each time, as compared to a standard, unindexed collection, is slower by a factor of about 60. That’s not 60 percent more time, that’s 60 times as much time (an hour vs a minute). Obviously, we were expecting some loss of write speed in exchange for the promised improvements in query speed and storage size, but this is egregious, which leads us to believe we are doing something profoundly wrong.
The data in question is sensitive, so we cannot provide it, or our code, but we can share the following:
- Each document consists of a timestamp (which we provide as a native timestamp), a single numeric series id (with cardinality on the order of 100 to 1000, set as the metadata field), plus anywhere between ~10 and several dozen numeric measurement fields, with no arrays or nested documents.
- All documents with the same series id share identical sets of measurement fields, and fields do not change type between documents.
- The data we receive is locally ordered by time, but not guaranteed to be globally ordered (i.e. we receive “chunks” of sorted data, but chunks may be received out of order).
- No indexes except those created automatically by mongo are used.
- We are using the java sync driver
- We have tried batching anywhere between 100 and 100,000 documents before calling insertMany, with no noticeable change in total upload time.
- We have tried both single-threaded and multi-threaded (up to 32 threads on a 16 core machine), with no noticeable change in total insert time (measured from log timestamps, only counting the time from the first insertMany is called, to when the last returns), with no noticeable change in total upload time. We have not tried parallelizing at the node level, but we have confirmed we are not bottle-necked by network I/O.
- We have tried disabling ordered writes, with either no noticeable change, or a slight increase in total upload time (which runs counter to our expectations, based on the documentation).
- No other workloads are accessing the cluster during test runs
- mongostat shows “spikes” of a 1000-2000 inserted in one printout row, followed by several seconds to a few minutes of absolutely no activity
- No errors or warnings are logged from the shard data servers, mongos servers, config servers, or the java driver.
- Our estimated working set of one full run (based on raw data size multiplied by a fudge factor) comfortably fits within a single server’s memory, several times over. We have confirmed via Prometheus metrics that we are not getting anywhere close to our CPU or memory limits, and we do not have swap enabled.
Any advice as to what we can investigate or change is greatly appreciated.