I’m trying to upgrade some of our mongoDB hosts. This will change them from CentOS 6 and mongoDB 3.0 to CentOS 7 and mongoDB 3.6.
We have a shard replica set and a 3-member sharded config server. There are three hosts, with two of them having a full shard replica, while the other is just an arbiter.
I’m trying to test the upgrade procedure on some other virtual machines. I haven’t had a problem with the other upgrades, but they didn’t have a shard replica set. They were standalone shards that were only later converted into a single member shard replica set.
To test this, I shutdown the arbiter config server and copied its files to the three test VMs. Since they are supposed to be identical copies, I assume it doesn’t matter which host I get these files from. This appears to work. I setup mongos with the new config server names and I was able to update the replica set hostnames.
For the shard replica set, I shutdown the secondary shard and copied its files to the two non-arbiter test hosts. I started mongod with the replica set name on the arbiter with a blank data directory.
I followed this guide (https://docs.mongodb.com/manual/tutorial/change-hostnames-in-a-replica-set/#replica-set-change-hostname-downtime) to change the replica set configuration hostnames and then started mongod with the replica set name on each of the two non-arbiter test hosts.
The arbiter node seems okay, and I can run rs.status() from any of the three nodes, but the two nodes with data are in ‘RECOVERING’ state. I don’t know if this is normal, or if they are stuck there. We are far past the last piece of information in the oplog.
Is there a step I have missed? Will I need to use more recent shard backup data? How can I recover from ‘RECOVERING’? Can I force one of the members to become primary? The docs suggest that if a member falls far enough behind, it may require manual intervention. But I don’t know what to do about that? It seems to suggest a full resync, but with no member being primary, I don’t know how I would do that.
Thanks.