Re-Adding single arbiter to RS crashes it, load through the roof

Hi all,

I had a PSA architecture setup and running no problem. Then I was tasked with switching the arbiter and secondary around for bandwidth costs in AZs in a single AWS region.

First I removed the arbiter, converted it to a secondary and added it back to the cluster. Now there it’s PSS.
Then I removed the secondary I’m converting to an arbiter, removed the data, etc.

I then cleared out the data for the data dir, start mongod on the new arbiter, which is the exact same instance class I was using before, t4g.small. More than enough. I get the typical init failure logs/checkpoint logs waiting for replicate set information. Totally normal.

When I go to add the arbiter back with rs.addArb(""), it stays in ARBITER and healthy for about 10s, then on the mongoA host, the load just completely skyrockets. No connections from anywhere but the PS in the replSet (netstat -natp). Like 2k+ sysload. The little box dies, and the cluster considers the arbiter unhealthy.

Is there something strange about re-adding an arbiter that was a secondary? Is reusing the hostname bad?

This seems pretty bonkers that the initialization fails. Adding the arbiter originally did not do anything of the sort and only increased CPU a bit. Obviously I don’t want to use something massive just to init the arbiter. Then I’ll have to downgrade it, and add/rm the arbiter again anyways.

Mongo self-hosted 4.4.23 for all three nodes. About 350GB data size.

Hi @Rebecca_Jean and welcome to MongoDB community forums!!

With reference to the above statement, could you help me with a few details:

  1. Could you post the actual commands you used in the overall process, both inside the mongo shell and in bash if this is a Linux deployment?

  2. To convert the secondary to arbiter, are you following the mongoDB official documentation on Convert a Secondary to an Arbiter ? or are you referring to any documentation? If yes, could you please share the same?

  3. Are all the nodes in the replica set provisioned identically? As mentioned you are using t4g.small for the arbiter, but when the old arbiter was converted to a secondary, was the instance type changed to match the primary?

  4. Do you have a majority read concern disabled ?

  5. What’s the state and setting of the replica set now? Could you please post the current rs.status() and rs.conf() and whether you have read concern majority enabled or disabled". This would help me to understand the deployment in a better way.