Q. about Sharded Cluster Chunk Migration

Stennie_X · September 29, 2022, 2:42am

Q1. So I looked at the log, and I think the log below is the log for this phenomenon, so what is the meaning of this log?

Flow control is a feature in MongoDB 4.2+ (enabled by default) that attempts to limit write throughput for a replica set primary in order to keep the replication lag for majority committed data under the flowControlTargetLagSeconds. Normal flow resumes as the replica set catches up on the backlog of writes and advances the majority commit point.

Q2. To what level does the chunk migration take?

The Chunk Migration Procedure includes reading documents from a source shard, replicating them to a destination shard, and deleting those documents from the source shard at a critical step when the migration completes. MongoDB can perform parallel chunk migrations, but a shard can participate in at most one migration at a time.

If any of the shard replica sets in your deployment are lagging enough to trigger flow control limitations, this can also affect chunk migrations to/from the affected shard.

Q3. Currently, I am using the read preference as the primary, but if I change it to secondary, can I solve the delay for the read job during the chunk migration? And is there another solution?

Flow control is related to writing data to a majority of replica set members. I expect secondary read preferences won’t be helpful as you’ll be pushing more work to secondaries which already are unable to keep up with replicating writes from the primary.

To manage the impact of chunk migrations you could:

Avoid using arbiters. Arbiters cannot acknowledge writes and will introduce scenarios with additional replication lag and memory pressure with if there is a voting majority to sustain a primary without a majority of data bearing replica set members available to acknowledge writes. See Replica set with 3 DB Nodes and 1 Arbiter - #8 by Stennie_X for more info.
Schedule the Balancing Window so chunk migrations happen during an off-peak period with less contention for other CRUD activity.
Enable Secondary Throttle so chunk migrations wait for acknowledgement (equivalent to w:2) for each document move.
Set Wait for Delete so the final delete phase of a migration is a blocking phase before the next chunk migration. The default is for deletions to happen asychronously.
Tune or disable flow control.

Regards,
Stennie