@Stennie_X Thanks for the moderation. Actually I though the title was no more relevant with my last post and will not help people to find the correct issue with force: true parameter. Could we update the title of this thread with the other title ? The real issue is when using force attribute not really reseting version number of the replSetConfig. Thank you.
The force option forces a new configuration onto the member. Use this procedure only to recover from catastrophic interruptions. Do not use force every time you reconfigure. Also, do not use the force option in any automatic scripts and do not use force when there is still a primary.
When you use force : true , the version number in the replica set configuration increases significantly, by tens or hundreds of thousands. This is normal and designed to prevent set version collisions if you accidentally force re-configurations on both sides of a network partition and then the network partitioning ends.
It looks like that there may be a missing check for a valid range of version value, but there must be something else going amiss if your starting config version is 2147480329. Can you provide some more context on this version number – was that the result of a previous forced increment (or repeated increments) or a manually provided value?
As noted earlier, the use case for force reconfig is for recovery from catastrophic issues where a majority of your replica set members are unavailable. This option should be used relatively rarely (if at all) in the lifetime of a replica set.
I would be curious to know how the version number got to be so high as well. As Stennie stated, the documentation states that the version number will go up by 10’s or 100’s of thousands when using the force option of a reconfig, but even so you’d have to do that over 21,000 times (if the version changed by 100,000 each time). That’s a lot of reconfiguration.
There are ways to reset your replica set version without losing your data. Doing this however is a tricky proposition and requires care so as to not screw up your database. I would recommend doing this on a database storing data that you care about only after thoroughly testing it on test systems and making sure you have the process down.
@Stennie_X Yes I do agree force should not be used. @Doug_Duncan I actually use a kubernetes operator that update replicaSet config using this parameter and I think it did a lot of update to the config to attain the maximum version number.
But, as I was able to replicate the issue directly on 5.0.10 I though it was a good idea to report it.
I also find that using db.adminCommand({replSetReconfig:...}) seems to not create the term field in the config compare to using rs.reconfig who did it.
Also I was thinking, version number will be reinit when updating term but it’s not the case. If you can provide me with more explanation about term/version correlation please.
Interesting. I haven’t played around with any K8s operators for MongoDB for a while, so didn’t realize that they might be forcing a reconfig. Still it seems weird that it would have gotten that high as during my testing of just manually running reconfigs with a force option I was only seeing things go u on the order of 10s of thousands which would take a hundred thousand updates or so to get past the limit.
The document you linked to for resolving the issue looks like it could work once you modify the command to update the version number and not the name. Ive not done it that way, hut there are generally multiple ways to do the same thing. Again I would caution to be very careful when doing this and test thoroughly on a test system so you make sure you get the steps right. Also make sure you have a good backup of your database files and have the restore process down.