Insert performance

Hi all,

Thank you in advance for your reply.
I am using mongocxx r3.10.2 (Client) and v7.0.14 (Server) to conduct benchmarks to better understand MongoDB’s performance characteristics when dealing with time-series data.

The testbed includes 1x Server machine and x8 Client machines. All of which have the same underlying hardware:

  • Dual socket Intel Xeon Silver 4108 @1.80GHz
  • 64GB RAM
  • 100 Gbit Ethernet
  • 4x Samsung NVMe SSD 970 EVO Plus (250GB) as RAID 0 array

The problem that I am observing is that when launching 256 clients where each client performs a single sequential insertion at a time for 1,000 times, the observed round-trip latency is often well-above 20,000 (ms). On the other hand, the same test is conducted with the exception that the inserting data is organized as random access pattern, the avg. latency is around 220 (ms) per insert.
For the above, a single insert include a timestamp with the rest of 32 datapoints where each data point is 32 bytes. Totaling 1,024 bytes + timestamp for a single insert.

On server’s end, the test database is configured as a time-series collection with granularity as seconds.

On client’s end, the code is as following:

  // Init MongoDB instances
  mongocxx::uri uri(fmt::format("mongodb://{}:{}/?socketTimeoutMS=86400000", SERVER_ADDR, SERVER_PORT));
  mongocxx::client mongoClient(uri);
  mongocxx::options::insert insertOptions;
  insertOptions.ordered(false);

  // Access DB and collection
  const mongocxx::database mongoDB = mongoClient["BENCH_DB"];
  mongocxx::collection mongoCollection = mongoDB["BENCH_DB"];

  // Create document
  bsoncxx::builder::basic::document currDocument = makeSingleDocument(&insertData->at(i));

  // Dispatch insert
  start = std::chrono::high_resolution_clock::now();
  const core::optional<mongocxx::result::insert_one> &insertResult = mongoCollection.insert_one(currDocument.view(), insertOptions);
  end = std::chrono::high_resolution_clock::now();

I’m not sure what cause such a large difference in performance and how to alleviate such bottleneck when the only difference in the underlying access pattern of data. I suspect the issue in SEQ pattern might be because of lock contention since data is essentially inserting to the same bucket one after another. While in RAND pattern, it’s inserting to different buckets so there’s little to no lock. However, I also think 20k (ms) latency for a single insert to complete is quite strange.