How Enabling FCV on Sharded Clusters works

Hello,

I’m trying to understand how setting FCV on a sharded cluster works. Taking as an example version 4.0.
According to the documentation setting new FCV on a shared cluster is being done through a mongos instance.
I tried to check the code and found the following workflow:

  1. FCV is triggered on mongos : mongo/cluster_set_feature_compatibility_version_cmd.cpp at 91e3352a1aa717674575fce3cc6edb2f279a4479 · mongodb/mongo · GitHub
  2. mongos sends the command to config servers: mongo/cluster_set_feature_compatibility_version_cmd.cpp at 91e3352a1aa717674575fce3cc6edb2f279a4479 · mongodb/mongo · GitHub
  3. config replicaset executes mongo/set_feature_compatibility_version_command.cpp at v4.0 · mongodb/mongo · GitHub
  4. it does that through the block dedicated to config server: mongo/set_feature_compatibility_version_command.cpp at v4.0 · mongodb/mongo · GitHub
  5. Within this block config will update to new FCV and trigger upgradeChunksHistory for each collection.
  6. It will trigger the upgrade of FCV on each shard: mongo/set_feature_compatibility_version_command.cpp at v4.0 · mongodb/mongo · GitHub.
  7. setting FCV on config replicaset.
    ===============================
    As I understand from the above FCV is set on config replicaset after it was set on all shards.
    Locks, at least in version 4.0 seem to be set only at individual replicaset level, correct?
    Is this correct flow in case of a sharded cluster?

Thank you,
Cristian

Hi Cristian, why is important to know this. This is the kind of internal detail that is likely to change in future versions of MongoDB and shouldn’t make a difference to end users?

Hi ,

Sorry for the late reply, I’ll try to give some context on why we’re interested how this works, especially for 4.0.
We have some very large clusters in PROD (over 20 shards / multiple databases and collections / config database having over 1M documents) . We’ve observed impact during setting FCV to 4.0 on such a cluster while the whole cluster “freezes” when cache chunks are refreshed on each shard.
I think that understanding this process, would help us seek/find a way to do this with minimum impact on the overall cluster. I hope to understand what would be the factors that can cause impact when setting FCV.

Thank you,
Cristian

Hi Cristian, you should raise a SERVER ticket on jira.mongodb.org. Our core engineering team may have some answers.

Thank you Joe.
I will do that.

Best Regards,
Cristian