replSetReconfig force generates too high version number

Jean-Baptiste_PIN · August 2, 2022, 9:22am

I found that rs.reconfig is different from db.adminCommand({replSetReconfig:})

when doing the last I got an error

"errmsg" : "BSON field 'version' value must be < 2147483648, actual value '2147579250'",

But it work using rs.reconfig.

Do you know if we can reset the version to 1. I understand rs.reconfig add a term field but not db.runCommand …

any insights are more than welcome.

Thanks

Jean-Baptiste_PIN · August 2, 2022, 2:34pm

When using db.adminCommand({replSerReconfig: {...}, force: true}) I got an error whatever version number I put in.

{
	"ok" : 0,
	"errmsg" : "BSON field 'version' value must be < 2147483648, actual value '2147529374'",
	"code" : 51024,
	"codeName" : "Location51024",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1659450614, 1),
		"signature" : {
			"hash" : BinData(0,"LxJSIFgxTpIuQhwh+L1jRnFBzi8="),
			"keyId" : NumberLong("7124390870412951557")
		}
	},
	"operationTime" : Timestamp(1659450614, 1)
}

apparently force option will generate automatically an higher number but it seems that is not capped to a correct value ??

Any help is more than welcome

Jean-Baptiste_PIN · August 4, 2022, 6:47am

@Stennie_X Thanks for the moderation. Actually I though the title was no more relevant with my last post and will not help people to find the correct issue with force: true parameter. Could we update the title of this thread with the other title ? The real issue is when using force attribute not really reseting version number of the replSetConfig. Thank you.

Stennie_X · August 4, 2022, 11:04am

Hi @Jean-Baptiste_PIN,

I updated the title as requested.

Can you provide more information to help reproduce this issue:

specific version of MongoDB server
O/S version
steps to reproduce (For example, does this happen the first time you force reconfigure? What was the version before forced reconfig?)

It would also be helpful to know about a bit more about your use case for using the force option.

Please note that this option is only intended for extreme scenarios where a majority of replica set members are unavailable:

The force option forces a new configuration onto the member. Use this procedure only to recover from catastrophic interruptions. Do not use force every time you reconfigure. Also, do not use the force option in any automatic scripts and do not use force when there is still a primary.

Regards,
Stennie

Jean-Baptiste_PIN · August 4, 2022, 11:55am

Here is an update with a fresh install mongodb on docker:latest (v5.0.10)

> db.version()
5.0.10
> rs.initiate()
{
	"info2" : "no configuration specified. Using a default configuration for the set",
	"me" : "localhost:27017",
	"ok" : 1
}
rs0:SECONDARY> db.adminCommand({replSetReconfig: { "_id" : "rs0", "version" : 2147480329, "term" : 1, "members" : [ { "_id" : 0, "host" : "localhost:27017", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : {  }, "secondaryDelaySecs" : NumberLong(0), "votes" : 1 } ], "protocolVersion" : NumberLong(1), "writeConcernMajorityJournalDefault" : true, "settings" : { "chainingAllowed" : true, "heartbeatIntervalMillis" : 2000, "heartbeatTimeoutSecs" : 10, "electionTimeoutMillis" : 10000, "catchUpTimeoutMillis" : -1, "catchUpTakeoverDelayMillis" : 30000, "getLastErrorModes" : {  }, "getLastErrorDefaults" : { "w" : 1, "wtimeout" : 0 }, "replicaSetId" : ObjectId("62ebb14889135fc6140d5283") } }, force: true})
{
	"ok" : 0,
	"errmsg" : "BSON field 'version' value must be < 2147483648, actual value '2147497198'",
	"code" : 51024,
	"codeName" : "Location51024",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1659613988, 15),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1659613988, 15)
}

Jean-Baptiste_PIN · August 4, 2022, 11:56am

Here are docker command I use:

docker run -p27017:27017 --name mongo -d mongo /usr/bin/mongod --replSet rs0
docker exec -ti mongo bash

Stennie_X · August 5, 2022, 2:19am

Hi @Jean-Baptiste_PIN,

Thanks for the extra info.

The force option randomly increments the version number to try to avoid conflicts (per Reconfigure a Replica Set with Unavailable Members):

When you use force : true , the version number in the replica set configuration increases significantly, by tens or hundreds of thousands. This is normal and designed to prevent set version collisions if you accidentally force re-configurations on both sides of a network partition and then the network partitioning ends.

It looks like that there may be a missing check for a valid range of version value, but there must be something else going amiss if your starting config version is 2147480329. Can you provide some more context on this version number – was that the result of a previous forced increment (or repeated increments) or a manually provided value?

As noted earlier, the use case for force reconfig is for recovery from catastrophic issues where a majority of your replica set members are unavailable. This option should be used relatively rarely (if at all) in the lifetime of a replica set.

Regards,
Stennie

Doug_Duncan · August 5, 2022, 4:36am

I would be curious to know how the version number got to be so high as well. As Stennie stated, the documentation states that the version number will go up by 10’s or 100’s of thousands when using the force option of a reconfig, but even so you’d have to do that over 21,000 times (if the version changed by 100,000 each time). That’s a lot of reconfiguration.

There are ways to reset your replica set version without losing your data. Doing this however is a tricky proposition and requires care so as to not screw up your database. I would recommend doing this on a database storing data that you care about only after thoroughly testing it on test systems and making sure you have the process down.

Jean-Baptiste_PIN · August 5, 2022, 7:09am

@Stennie_X Yes I do agree force should not be used.
@Doug_Duncan I actually use a kubernetes operator that update replicaSet config using this parameter and I think it did a lot of update to the config to attain the maximum version number.

But, as I was able to replicate the issue directly on 5.0.10 I though it was a good idea to report it.

I also find that using db.adminCommand({replSetReconfig:...}) seems to not create the term field in the config compare to using rs.reconfig who did it.

Also I was thinking, version number will be reinit when updating term but it’s not the case. If you can provide me with more explanation about term/version correlation please.

I think I can be able to update the config following something similar to this instruction (https://www.mongodb.com/docs/manual/tutorial/rename-unsharded-replica-set/) ?

However, I’m reverting to another operator for sure.

Regards

Doug_Duncan · August 5, 2022, 12:53pm

Interesting. I haven’t played around with any K8s operators for MongoDB for a while, so didn’t realize that they might be forcing a reconfig. Still it seems weird that it would have gotten that high as during my testing of just manually running reconfigs with a force option I was only seeing things go u on the order of 10s of thousands which would take a hundred thousand updates or so to get past the limit.

The document you linked to for resolving the issue looks like it could work once you modify the command to update the version number and not the name. Ive not done it that way, hut there are generally multiple ways to do the same thing. Again I would caution to be very careful when doing this and test thoroughly on a test system so you make sure you get the steps right. Also make sure you have a good backup of your database files and have the restore process down.

Best of luck.