Troubleshoot or Stop Long-Running Shard Migration

@MaBeuLux88_xxx, thanks so much for your attention while troubleshooting this!

We discovered that having the primary config server step down seemed to result in the shard balancer commands becoming responsive again. However, we were still seeing the migration seemingly never completing.

Ultimately, we scheduled some downtime and rebooted the entire cluster (all config machines, and all replica set members in all shards). This did resolve the issue, and we now see data migrating at a decent clip.

Amazingly, “have you tried turning it off and turning it back on” applies even here! :sweat_smile:

We hear you on updating the cluster - these were initially deployed back in the 3.6 days, and it’s been hard to prioritize updating them. But I’m sure it’d be better to do that proactively than under duress!

1 Like