Failed secondary replica can't sync

Justin_Sayre · April 22, 2021, 7:15pm

N00B here - I have a three-node cluster - primary, secondary and arbiter (voting only, no data). The secondary has fallen out of sync with the primary and I cannot catch it up. It appears the oplog size is not large enough. If I understand things correctly, I need to extend the oplog size on secondary before primary (with DB restarts). My actual question is: IF primary fails before the above steps are taken (assuming they are correct), with secondary being out of sync, what happens? Am I better off at this point removing the secondary from the cluster so that if the primary fails, the now-defunct secondary does not try to take over?

Thanks in advance.

kevinadi · April 28, 2021, 1:02am

Hi @Justin_Sayre welcome to the community!

If you’re just starting on your MongoDB journey, I highly recommend the free courses available at the MongoDB University. They cover materials from beginners to more advanced topics.

Having said that, I will go ahead and just jump into the deep end with your questions

The secondary has fallen out of sync with the primary and I cannot catch it up. It appears the oplog size is not large enough.

To determine your oplog’s length in time, you can use db.printReplicationInfo(). Note that the interpretation of this command assumes a steady state of writing, so it could show less time if your cluster is busy, or show more time if it’s less busy.

Once a secondary fell off the oplog, the only way to recover it is to do a resync.

If I understand things correctly, I need to extend the oplog size on secondary before primary (with DB restarts)

That scenario is assuming the secondary is still functioning. I believe it is not in your case. Thus the procedure outlined in Change the Size of the Oplog doesn’t really apply to you since you’re gonna rebuild a new secondary anyway (with an appropriately sized oplog from the start). At this point, you just need to resize the primary’s oplog.

Am I better off at this point removing the secondary from the cluster so that if the primary fails, the now-defunct secondary does not try to take over?

A defunct secondary will never try to take over (marked by its status of anything other than SECONDARY in the rs.status() output). Only when a node having a status of SECONDARY will it be able to take over as primary, since that status means that it’s up, ready to take over, and is following the primary’s write closely.

In your situation, I would assess what is the appropriate oplog size for your workload. This can be done using simulations of your production workload, so that this situation doesn’t repeat itself in the future.

I would also consider deploying a primary-secondary-secondary setup instead of primary-secondary-arbiter. Having two secondaries vastly increase the chances of the whole set having zero downtime in the face of failures. It will also help with zero downtime maintenance, since with two secondaries you can do an effective rolling maintenance/upgrade.

Another point is, if your current setup have a different hardware spec between primary/secondaries, I would consider making them all the same, since in a replica set, all nodes have an equal chance of becoming primary (assuming default setup of voting/priority). Unless you have very specific needs and reasons, I would not change the default voting/priority settings, since it will interfere with High Availability guarantees a replica set gives you, and making failure scenarios more complex.

Best regards,
Kevin

Justin_Sayre · April 28, 2021, 1:57pm

Hey Kevin, thanks SO much for replying. I am definitely going to check out the course. Unfortunately, I inherited a production setup running on 3.2, so there’s a lot of work to be done - and I have learned a lot, just since posting. The oplog size is about 30 minutes or 1GB in size, and the DB is well over 35GB. I have already tried the resync procedures noted here and both methods (deleting the db files/directory and copying the files from PRIMARY) failed. At this point, I assume that my only recourse is to take PRIMARY offline, resize its oplog, and then bring it back up. Assuming that works, then will SECONDARY’s oplog get the new size, or do I need to resize it in the .conf settings?

Thanks again!

chris · April 28, 2021, 2:44pm

Hi @Justin_Sayre

Yes for v3.2 that is correct v3.6+ this is an online operation.

As your secondary is not synced, yes update it’s conf this is configured per replica. As for your primary follow the procedure:

Justin_Sayre · April 29, 2021, 9:40am

Sorry about my formatting - this is basically my first time doing this.

In case anyone runs into this problem and this topic, I successfully re-synched my SECONDARY by taking the following steps:

I took secondary offline.
I deleted the DB files and directory on SECONDARY
I adjusted /etc/mongod.conf so that my oplog was much larger. I chose to go with 24GB, because what I was seeing on PRIMARY was 1GB = 30 minutes, and I wanted to give myself enough time for the sync, as the nodes were in different regions of the country.
I left SECONDARY offline until I resized PRIMARY
I took my application offline for 15 minutes to quiesce the DB
I took my PRIMARY offline and followed the “Change the Size of the Oplog” procedure noted by Kevin and Chris above.
I brought PRIMARY back online
I brought SECONDARY back online - re-sync took about 3.5 hours.

DB stats:
“collections” : 44,
“objects” : 3564420,
“avgObjSize” : 28231.381419136913,
“dataSize” : 100628500558,
“storageSize” : 36791537664,
“numExtents” : 0,
“indexes” : 64,
“indexSize” : 117452800

Please note - this cluster is three nodes, with PRIMARY, SECONDARY and ARBITER. I understand that this may not be best-practice, but it is what I inherited.

system · May 4, 2021, 9:40am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.