During chunk balancing, we are seeing high dirty pages and hight dirty cache

We are currently experiencing an issue with chunk balancing, we scheduled a chunk balancingonce a day during periods of low traffic. When chunk balancing takes place, we notice a significant increase in the number of dirty pages and dirty cache. Based on my understanding, this could be due to the movement of data chunks between the shards. However, it seems unusual because we are only inserting less than 1 GB of data per day, yet we observe nearly 2 GB of dirty cache during the chunk balancing process. Consequently, this high disk utilization during chunk balancing negatively affects the performance of our application server.

To provide some context, we are operating a 3-shard cluster, and our shard key is a compound key composed of three fields: “a,” “b,” and “c.” Both “b” and “c” are UUIDs, while “a” is a string with only a few unique values. Could the selection of this shard key, with its specific combination of fields, be contributing to the problem we are experiencing?

Hey @Kiran_Sunkari,

Thank you for reaching out to the MongoDB Community forums.

Have you observed whether the balancing window is long enough to achieve a balanced state for the cluster every day? If not, it is possible that the balancer is unable to keep up with the balancing work and, as a result, the cluster will never reach a balanced state.

If you have a substantial amount of data to balance, it could lead to a significant accumulation of dirty pages and cache as the process involves writing to the disk, which can result in additional storage usage.

You mentioned that you are inserting less than 1 GB of data per day. Could you clarify if this is the data size per shard?

Could you provide details on where you see this number? Additionally, is this a 2 GB number per shard or for the entire deployment? Note that a large dirty cache doesn’t imply any issues. Furthermore, as workload balancing involves data migration to different shards, dirtying the cache is a natural part of the process when there is a significant amount of data to be moved.

To better understand your deployment environment, please provide the following additional information:

  • The MongoDB version you are using.
  • The deployment configuration of your MongoDB setup.
  • The type of shards being used (replica set or standalone).
  • Are the shards located on separate hardware or within different Docker environments on the same hardware?
  • Lastly, also share the output of sh.status().

Feel free to provide any further details related to your deployment so that we can assist you more effectively.

Best,
Kushagra