Q. about Sharded Cluster Chunk Migration

I use MongoDB 4.2 ver.

I created a situation where a large amount of CRUD was always coming from the application, and I made it run chunk migration periodically.

But one day, there was a serious delay in one of Shard’s CRUD operations. This also caused a service failure. I think a lot of query load and chunk migration is the cause.

Q1. So I looked at the log, and I think the log below is the log for this phenomenon, so what is the meaning of this log?

Date~~ W STORAGE [FlowControlRefresher] Flow control is engaged and the sustainer point is not moving. Please check the health of all secondaries.

Q2. To what level does the chunk migration take?

Q3. Currently, I am using the read preference as the primary, but if I change it to secondary, can I solve the delay for the read job during the chunk migration? And is there another solution?

Hi @Kim_Hakseon,

Q1. So I looked at the log, and I think the log below is the log for this phenomenon, so what is the meaning of this log?

Flow control is a feature in MongoDB 4.2+ (enabled by default) that attempts to limit write throughput for a replica set primary in order to keep the replication lag for majority committed data under the flowControlTargetLagSeconds. Normal flow resumes as the replica set catches up on the backlog of writes and advances the majority commit point.

Q2. To what level does the chunk migration take?

The Chunk Migration Procedure includes reading documents from a source shard, replicating them to a destination shard, and deleting those documents from the source shard at a critical step when the migration completes. MongoDB can perform parallel chunk migrations, but a shard can participate in at most one migration at a time.

If any of the shard replica sets in your deployment are lagging enough to trigger flow control limitations, this can also affect chunk migrations to/from the affected shard.

Q3. Currently, I am using the read preference as the primary, but if I change it to secondary, can I solve the delay for the read job during the chunk migration? And is there another solution?

Flow control is related to writing data to a majority of replica set members. I expect secondary read preferences won’t be helpful as you’ll be pushing more work to secondaries which already are unable to keep up with replicating writes from the primary.

To manage the impact of chunk migrations you could:

  • Avoid using arbiters. Arbiters cannot acknowledge writes and will introduce scenarios with additional replication lag and memory pressure with if there is a voting majority to sustain a primary without a majority of data bearing replica set members available to acknowledge writes. See Replica set with 3 DB Nodes and 1 Arbiter - #8 by Stennie for more info.

  • Schedule the Balancing Window so chunk migrations happen during an off-peak period with less contention for other CRUD activity.

  • Enable Secondary Throttle so chunk migrations wait for acknowledgement (equivalent to w:2) for each document move.

  • Set Wait for Delete so the final delete phase of a migration is a blocking phase before the next chunk migration. The default is for deletions to happen asychronously.

  • Tune or disable flow control.

Regards,
Stennie

2 Likes

hi, @Stennie and Thank you :smiley:

Can I ask you again about your answers?

Q2-1. I expect a lock to take place during chunk migration, but which this lock level is collection level or document level?
(And I think this lock is an intent lock.)

“MongoDB can perform parallel chunk migrations, but a shard can participate in at most one migration at a time.”
Q2-2. If there are 3 shards sharded cluster, are you saying that there are only 2 shards that can participate in 1 chunk migration work?
(1 → 2 or 2 → 3 or 3 → 1 chunk migrating… and the other shard is off.)

Q3-1. I understood that the write operation is controlled by flow control during chunk migration. If so, is there no relationship between chunk migration and read operation?

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.