How to force replicaset during actual failover

I need some clarification on replication fail-over steps.

Lets say i have replicaset of 3 nodes.
1–>primary server with priorty 1(24 hours of oplog window)
2–>secondary server with priorty 1(24 hours of oplog window)
3---->secondary server with priorty 0(24 hours of oplog window)

now lets say my datacenter1 went down(so both primary and secondary servers are not reachable due to network issue).

because of datacenter1 is not available and due to application critical we decided to make secondary server on datacenter2 to primary.

how do we do that? do i have to stop and start as standalone? or is there a way wihtout restarting database we can make are read wirte mode?

after 4 hours lets say datacenter1 is available, since we have 24 hours oplog how do we sync datacenter1 without doing initial sync?

your replica set topology has 3 nodes. So, when your datacenter 1 is down, two nodes are down. the third node can not become primary (because majority among 3 is 2 and in your case since priority is set to 0 it can not become primary ever). That is your app will not be able to access your database.

When we create a replica set, we define its topology, we explicitly do not define which are primary and secondary nodes. When we connect using mongo shell to our cluster, automatically one of the nodes is selected as primary. This node performs read and write and talks to our app.

Now, when our primary node goes down, the remaining nodes will automatically select a primary node among themselves (if possible).
Similarly, when a node rejoins the cluster (in your case datacenter 1 nodes), they automatically sync their data with the current primary node.

1 Like

I will have a stab at this and I will learn something if I am wrong :slight_smile:

So if you look at this chart

Number of voting members Majority Failure tolerance
1 1 0
2 2 0
3 2 1
4 3 1
5 3 2
6 4 2
7 4 3

To elect a new primary you need 2 voting members, if part of a 3 voting members replica set, as mentioned previously.

So if you have 1 voting member in the fail-over data-center, it cannot become the primary and can only accept read operations.

If you move to 4 nodes, 2 in each data center, the majority required to elect a primary is now 3. So 2nd data center still cannot elect one of it’s nodes as primary, since 2 isn’t 3 (see above).

Also you now have 4 nodes, so in an election, it is possible to have a drawn vote, which means time is taken up by having another election, which means your node replica set is not responding to any requests from the application(s), which isn’t good. Until at least 3 nodes decide on the same primary.

Your fail-over data-center cluster is designed to be a duplicate of the primary data center. It isn’t there to make it a easy way of replicating data to your secondary data center. By the way, thanks for this, because I had been pondering the same problem myself. So any responses to this thread will provide me some guidance.

So what is the best way to get my oplog to another mongo replica set, to keep them in synch in case of failover?

1 Like

That’s why in the lecture it is mentioned that it is always preferable to have odd number of replica set members who can vote. Having odd numbers reduces the chances of repeating election process. Further 3 and 4 are individually providing the same tolerance level.

There is a concept called replication window which is taught in one of the lectures. So if one of your nodes goes down and later when it comes online, it will automatically match its oplog with oplog of one of the other nodes and sync its oplog. this is how data consistency is achieved. However this can be done only if the node comes back alive within the replication window (please refer to lecture for what it represents).

If the node comes alive after replication window, then it goes to recovery mode and it has to be manually brought in sync with other nodes.

I know that an even number is not desirable, my point was that going to an even number of nodes wouldn’t help. Since it would bump up the majority required.

Also this isn’t about the replication window, I am fully aware of that, that doesn’t solve the problem here.

You need in your 2 data centres a complete replica set in each, which can vote on their own primary. As you said, putting 1 node in the 2nd data centre, doesn’t help, it just becomes a secondary, in the event of data centre 1 going down.

So you need a mechanism to transfer the “oplog” or whatever to data centre 2 from the primary replica set in data centre 1, which doesn’t have them tied together, from a voting/majority point of view.

However what I am hoping for, is that the replication window process can be applied to the process of updating the primary of the 2nd data centre replica set :slight_smile:

Remember when the 2nd data center kicks in due to outage in data centre 1, it needs to be able to step in immediately and seamlessly.

So your replica set has 3 nodes(2 nodes in data center 1 and 1 node in data center 2). Under normal circumstances, the primary node of data center 1 handles read and write and other two nodes replicate data. When data center 1 goes down, then our replica set can not function.

To make node 3 in data center 2, operate as a primary node, my solution is as follows:
you change the topology of your replica set, either by removing the two nodes in data center 1 or you set their votes to 0. After this, node 3 can act as primary node as per the new configuration of the replica set.

node 3 will have its oplog in sync with other 2 till data center 1 goes down. After that oplog of node 3 will grow. Now, when the data center 1 comes back, you can again change the topology to make it a 3 node replica set and the 2 nodes will automatically sync their oplogs.

This solution requires you to change the topology manually. I do not know if it can be done automatically.

Thanks for your response, reason why data center2 node priorty=0 is that will be for disaster recovery(want to bring up manually by changing priorty only if we can’t bring datacenter1 within 4 hours then i want to failover to datacenter2 , expecting there will be data loss). so my question is what is best way to bring datacenter2 node up during failover(as primary) and sync datacenter 1 later(once datacenter 1 is available within 24 hours oplog time)

Ok I just assumed that Mongo some other mechanism for cloning replica sets that I hadn’t read about in my studies, so my bad. Other databases have tools for replication, for fail over cases.

Anyway : -

The solution proposed in the documentation, is to keep the replica set and have a node in one of 3 data centers, which to be fair, isn’t much of a solution if you only have 2 data centers.

So in principle this should work automatically. You could shove 1 node (or 2) in data centre 1 and 2, and put an arbiter node (Or some node, which can vote, but cannot be a primary, and doesn’t take any data) in a cloud service to break a tie. The cloud server becomes your 3rd data center. Just hope your network stays up in the failure :slight_smile:

I have to be honest very surprised there isn’t a mechanism to clone a replica set without having to come up with some artificial config.

Any way there is a white paper below, that I haven’t read, but may prove useful. If you do find something, post the solution, I would be interested.

As the node in DC2 has a priority of 0. It will never become primary in the event of an election. If you have a node you wish to become primary in the event of a failover, the node should not have a priority of 0.

Also, even if this node was had a priority of 1 it would not become primary if DC1 goes fully down as there is no majority in the replica set. The replica set will have a single secondary.

The only way to achieve a truly highly available replica set which can withstand a full data centre outage is to have three data centres or additional nodes in DC1.

When adding additional nodes, we need to keep in mind that a replica set needs to have an odd number of voting nodes to have a majority in the event of an election. With a 2 DC configuration such as this, we would need to add two voting nodes to DC2 so that we can have a majority in the event of a failover.

If you have 24 hours of oplog window, when you bring the nodes back up, they will find a common point in the oplog and begin to replicate. You will only need to resync if nodes have been down for longer than 24 hours. You should configure your oplog window with this in mind.