We are using MongoDB Node Driver to invoke the bulkWrite.
We have very minimal indexes.
Using an M20 Cluster.
Each bulk write will consist of 10000 records in a batch, which are executing in series.
Is there any way to enhance the performance of this bulk write ? Would switching to another Mongo Driver help here ? since the driver only invokes the bulkWrite.
It looks like you are trying to throttle the whole operation. Why only 10000 documents? Why in series? From where (hardware and network) are you calling bulkWrite? 3.6 million of documents of 1Kb each document is a lot different than 3.6 million of documents of 1Mb. Have you tried different work distribution strategies? Less or more document per bulkWrite, in parallel rather than in series. Knowing everything that you tried would save us a lot of time. Because proposing something you already tried and that you already know does not work is a waste of time. And what is you use-case? Is is frequent that you must insert 3.6M documents? Or you are trying to stress test your system by simulating 3.6M people doing 1 insert.
Why only 10000 documents?
It was mentioned here that the bulk write has a maxLimit of 100000. And we noticed that the batch size of 100000 performed poorly when compared to 10000.
Why in series?
My bad, I missed one major point. We are processing 10 batches in parallel.
I’ve shared below different configurations of Parallel Ops and Batch size along with their corresponding time taken to upsert 100000 records (we reduced the number of records, just to see how it would behave with varying configurations).
MongoDB Cluster Tier
No. of Parallel Ops
Batch Size
Time taken to Insert 100000 records
M20
10
2000
23049
M20
10
1000
23725
M20
10
2500
24443
M20
5
1000
27469
M20
5
2000
27578
M20
10
5000
27611
M20
5
2500
28427
M20
5
10000
29561
M20
10
10000
30674
M20
5
5000
37241
From where (hardware and network) are you calling bulkWrite?
I’m using M1 Macbook Pro, although this bulk write is expected to be initiated via an AWS lambda (Amazon Linux). We have tested this action on the lambda and it seems to be behaving the same way there as well.
Here are the configurations for the Macbook
Model Name:
MacBook Pro
Model Identifier:
MacBookPro17,1
Total Number of Cores:
8 (4 performance and 4 efficiency)
Memory:
16 GB
Network: 200Mbps
Average size of each document: 428B
Have you tried different work distribution strategies
I’ve shared the different configs we have tried above.
And what is you use-case? Is is frequent that you must insert 3.6M documents?
This is not a very frequent action, its expected to happen maybe once or twice in a day. We are building an LMS, our clients would have 10K students and 300 contents (in each course). We need to insert 3 million records to the DB for an architectural use case(when the contents are created and published all at once). Although this is the worst case scenario, we need our system to support this number.
Most likely caused by the fact that you are doing performance testing on a cluster type that is wrong for this purpose. See the following note about M20
Do you have any metrics on how the cluster performs? CPU usage, disk I/O, RAM …
So basically, you are expecting good performance out of something configured for low traffic.
So before trying to optimize your code you have to make sure you are using the proper infrastructure.
We checked the metrics, It did seem like CPU usage was touching 100% at times. We will upgrade to a higher tier cluster and monitor the performance.
Also, had some questions related to the code size optimisations that we could do once we opt a higher tier.
Do you recommend processing in batches ?
How many batches should we ideally process in parallel ?
Would the Mongo Driver play any significant role with regards to a bulk write (I had read about the driver performance comparison here). If yes, which driver client do you recommend ?
Thanks for the input @steevej. This has been helpful.
I do not think that the driver would make a big difference during bulkWrite as the bulk of the work will be done in the server. About the comparison link you share, one issue I see reading diagonally is
The benchmark was ran in my personal laptop (i7–6th generation, 24 GB RAM), against a dockerized MongoDB 4.2 also running locally, so network times are negligible but on the other hand we have to take in account that the same machine has to cope with running the DB server and the client concurrently.
Yes network time are negligible but context switches increases. You have to test with an architecture that is closed to what you want to implement. If running the client and the server on the same machine is not your use-case, then your results may differ.
It all depends of your use-cases. I think your approach of testing different number of batches vs size of batches is appropriate and should provide you with the correct configuration one you figure out and eliminate your bottlenecks until you reach the performance you want. In your table, there is not an order of magnitude differences between each configuration and the fact that you are using 100% CPU indicates that the bottleneck was indeed the M20. Only running the same tests on a bigger machine can provide you with the insight on the best config.
There is no easy answer except the easiest of them all, it depends.