Why is my 4 node replicaset setup not working?

I have a 4 Node setup which is being hosted on Prem by 2 Host Servers (DC1, DC2) - Primary (Located in DC1) - Secondary (Located in DC2) - Arbiter (Located in DC2) - Arbiter (Located in DC1)

We opted for this setup since we do perform maintenance on our servers from time to time which may require restarting/shutting down the Main Server located in either DC1/DC2.

Since our previous setup is only 3 Nodes
1 Primary (DC1)
1 Secondary (DC2)
1 Arbiter (DC2)

This have a single point of failure that is relying on DC2. If we shutdown DC2, the Primary will be the one remaining and since there are only 1 node. The election of the primary would not take effect since it needs at least 2 voting members.

However upon adding another arbiter on DC1. When DC2 is down, there will be 2 Nodes remaining (Primary and the 2nd Arbiter on DC1). I expected that the replicaset will still be available. But then the Primary became a Secondary and no Primary is being elected. Why is this and how should I go about doing it much more HA?

You need a odd number of nodes to configure a replica.

In your current setup when ever communication between DC1 and DC2 is down, nor DC1 nor DC2 can have majority so both DCs because unusable. With 4 nodes you need 3 nodes to get majority.

Is there a way to change that into 2? instead of 3 on my current setup. If there are, I cant seem to find it on mongodb documentation.

No you cannot. It is mathematical. The majority for 3 nodes is 2 voting. The majority of 4 nodes is 3 voting.

With 2 DCs, if the one with the majority of nodes is down the replica set is not usable. If the communication between DC1 and DC2 is down, DC2 will have majority and one node will become primary. But your writes with write concern majority will fail since the arbiter does not store data.

Dear @sg_irz, it has been a few days since I provided input on your issue.

I spent time on your issues and I would appreciate a followup and perhaps closure.

Other users of the forum would also appreciate as they may have similar issues and would gain by knowing what worked and what did not.


1 Like

We opted for a 3rd Data center for DR purposes

1 Like