Upsert by Bulkwrite Performance issue with 2 sparse indexes

frand_li · October 18, 2023, 2:14am

Hi Experts,
I am using bulkwriter to upsert batch documents into a collection with 2 unique sparse indexes.
During the testing, the CPU usage was almost 100% and there was timeout error when the bulkwrite size is 100 only.

However, I did the similar testing on a collection with single unique index. There is no such perfromance issue at all. THe CPU usage was only below 10%.

I am quite confused with the issue.
Appreciate any feedback.

More information are provided as below.

platform information:
MongoDB Atlas M20: 6.0.11
Node.js:
version 20
mongodb: 6.1.0

Indexes on the collections:

[
    {
      key: { myId1: 1 },
      name: "myId1_1",
      unique: true,
      sparse: true
    },
    {
      key: { myId2: 1 },
      name: "myId1_2",
      unique: true,
      sparse: true
    }
  ]

Here is the list for bulkwrite: The number of bulksize is 100, and there is same index id for a whole batch.

[{
   "updateOne":{
        "filter":{"myId1":"myId1_0"},
        "update":{"$setOnInsert":{"myId1":"myId1_0"},"$set":{"att1": "100", "updated":1697585425844}},
        "upsert":true},
   "updateOne":{
        "filter":{"myId1":"myId1_1"},
        "update":{"$setOnInsert":{"myId1":"myId1_1"},"$set":{"att1": "200", "updated":1697585425844}},
        "upsert":true},
}]

node.js for bulkwirte

await dbClient.db(dbName).collection(colName).bulkWrite(bulkWriteList, { ordered: false });

Aasawari · October 18, 2023, 6:51pm

Hi @frand_li and welcome to MongoDB community forums!!

As mentioned in the MongoDB Documentation for clusters, M10 and M20 operate on a burstable performance infrastructure, which utilizes a virtual (shared) core and dynamically allocates CPU credits according to their instance size.

To understand further could you help me understand if you are observing this issue constantly or does this happen in a specific condition.
Also, how large is the data set that you are trying to update using the bulk write operation?

The recommendation here would be to look for CPU steal % and consider upscaling the cluster as the cluster would have possible exhausted the credits and hence resulting in high utilisation.

Also, I would recommend reach out to the MongoDB Support to deeper insights and understanding the cluster information.

Best Regards
Aasawari

frand_li · October 18, 2023, 11:36pm

Hi, Aasawari,
Thanks for your quick response.
The bulkwrite size was 100 only and the number of documents in the collection was only 10,000.
I tried to upsert 10,000 documents with connection poolsize 5, which means there were up to 5 concurrent process to do bulkwrite at same time.

If the collection has only 1 unique index set for a collection, then there is no performance problem at all even if the bulkwrite size is set to 1000.

But once there are 2 unique indexes created as sparse above or compound indexs as below, the performance issue occured with timeout error even if the bulksize was set to 100.

With same data and same sending patterns, the only difference is the indexes. It seemed like that the indexes of sparse or compound didn’t work at all. It should not be caused by the performance bollte neck of M20.

Is there anything wrong with the using above 2 types of indexes?
Many thanks.

 {
    myId1: 1,
    myId2: 1
  },
  {
    unique: true,
    partialFilterExpression: {
      $or: [{ myId1: { $exists: false } }, { myId2: { $exists: false } }]
    }
  }

frand_li · October 19, 2023, 6:42am

Here are my new findings.
When indexes were created as below, there is no performance problem to bulkwrite documents with myId1. All 1000 documents can be upserted isntantly at even M10.
Howver, there will be timeout issue with bulkwritting only 500 documents with the second unique index, aka, myId2 even on M60 server.

Is it possible a bug? Or it was caused by misusing the compound index?

 {
    myId1: 1,
    myId2: 1
  },
  {
    unique: true,
    partialFilterExpression: {
      $or: [{ myId1: { $exists: true} }, { myId2: { $exists: true} }]
    }
  }

BTW, there was a typo in previous reply about the compound indexes json by setting ture as false:
$or: [{ myId1: { $exists: false } }, { myId2: { $exists: false } }].

Aasawari · October 25, 2023, 5:31am

Hi @frand_li and thank you for getting back.

Based on the information provided earlier, I’m attempting to recreate the problem on a local setup. It would be greatly appreciated if you could provide additional details for a clearer understanding.

With respect to the defined indexes, it appears that “myId1” and “myId2” field values should be unique. In the context of the bulkWrite operation, are you attempting to filter a single document in a single operation and subsequently update the “att1” and “updated” values using this bulkWrite operation?

Furthermore, I would like to request the following information:

Could you provide a sample document from the collection?
More comprehensive details regarding the error message, including exact error logs, would be quite beneficial.
If possible, please provide a concise, reproducible code snippet that can help me replicate the issue.

Regards
Aasawari