Spark Bulk Write taking high CPU utilization

williamwjs · August 29, 2024, 11:09pm

I am using Spark job to process some heavy data join and then write back to MongoDB periodically using Spark connector.

However, every time this bulk write happens (~2000 records), with maxBulkSize to be 8 or the default 512, the cluster CPU utilization rate would spike to 150%.

May I ask if I am doing something wrong here? What is the best practice to work with Spark-connector bulk-write?

Thank you!