Mongodb failure to resync a stale member of a replica set

I have mongodb (version 4.2) replicaset with 3 nodes - primary, secondary, arbiter, primary occupies close to 250 GB disk space, oplog size is 15 GB

secondary was down for few hours, tried recovering it by restarting, it went into recovering forever.

tried initial sync by deleting files on data path, took 15 hours, data path size went to 140GB and failed

tried to copy files from primary and seed it to recover secondary node followed Resync a Member of a Replica Set — MongoDB Manual This did not work - (again stale)

in the latest doc (5.0) they mention to use a new member ID, does it apply for 4.2 as well? changing the member ID throws error as IP and port is same for node I am trying to recover

This method was also unsuccessful, planning to recover the node using different data path and port as primary might consider it as a new node, then once the secondary is up, will change the port to which I want and restart, will it work?

please provide any other suggestions to recover a replica node with large data like 250 GB

Hy @bhargava_vn,
welcome to the community!

How many indexes do you have?
What kind of error do you have on the secondary?

Best regards

Hi @Fabio_Ramohitaj ,
Thanks for the reply
I use many indexes, do you need count?
I managed to get total size of indexes - 123177967616
attaching last snippet of log with error message
error_log.txt (2.1 KB)

Hi @bhargava_vn,
i see in the log this issue:
initialSyncAttempts: [ { durationMillis: 50655350, status: “HostUnreachable: error fetching oplog during initial sync :: caused by :: error in fetcher batch callback :: caused by :: Error connecting to …”, syncSource: “:270…” }
could it be a network problem?
Check it out and let me know

Best Regards

I did not have much time to troubleshoot, so went ahead with the plan which we thought would work (1hr downtime)

  1. shut down primary
  2. Copying the data files from primary node, placing it in new db path (other than the recovering nodes db path)
  3. changing log path
  4. starting mongo service with different port (other than the one used by recovering node) (change in DB path, log path and port were done hoping mongodb would consider this as a new node - alternate way compared to what is mentioned in 5.0 doc to use new member ID)
  5. start primary
  6. adding it to replicaset using rs.add(“IP:new port”) on primary

This worked, could see the secondary node coming up successfully

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.