Throttling huge bulk upserts to reduce load on cluster

Orderlion · February 10, 2025, 11:00am

Hello everyone,

We are running an Atlas M30 cluster for our main meteor app.
We also have a 4 plain nodejs workers, that handle our huge data imports on a daily basis.

The thing is, that some of these imports require up to 3 Mio data upserts in our mongo DB.

I already read this thread: Update on 2 billion records will take 1.5 year

We are already doing these things:

using unordered bulk operations
setting writeConcern { w: 0 } for HUGE imports
creating chunks of, in our case, 200k upserts, run the bulkOp, await the result, reset the queue, run the next chunk
improved all indexes, …

In summary, our upserts are really quick! The issue is the LOAD on the database this causes.

During these times, even if the 3M upserts are done after e.g. 10 to 15 minutes, the database is at 100-200% CPU and the main app even has issues READING data from the cluster, basically making our app unuseable for our users.

→ tl;dr:

How can I throttle the bulk operation a bit to take up LESS cpu, making the whole process a bit slower, but reducing the load on the DB?

Right now, whatever I optimize, it might make the process quicker, but the load on the DB remains incredibly high.

Thanks a lot in advance,
best, Patrick

Orderlion · February 12, 2025, 5:20pm

Hello!
Quick bump here - any input?