Initial sync is failing

Hi,

We have an existing replica set with 2 node and 1 arbiter. We want to add one more node to the existing replica set which I did and added the node to primary node using “rs.add” command however after running for over 20-30 hrs , the initial sync fails with below error:

2023-05-30T09:46:36.408-0400 E REPL     [replication-404] Initial sync attempt failed -- attempts left: 0 cause: NetworkInterfaceExceededTimeLimit: error fetching oplog during initial sync: Operation timed out, request was RemoteCommand 16829255 -- target:mongodb02:27017 db:local expDate:2023-05-30T09:46:35.997-0400 cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1685454318000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000 }
2023-05-30T09:46:36.408-0400 F REPL     [replication-404] The maximum number of retries have been exhausted for initial sync.
2023-05-30T09:46:36.458-0400 E REPL     [replication-404] Initial sync failed, shutting down now. Restart the server to attempt a new initial sync.
2023-05-30T09:46:36.458-0400 I -        [replication-404] Fatal assertion 40088 NetworkInterfaceExceededTimeLimit: error fetching oplog during initial sync: Operation timed out, request was RemoteCommand 16829255 -- target:mongodb02:27017 db:local expDate:2023-05-30T09:46:35.997-0400 cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1685454318000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000 } at src/mongo/db/repl/replication_coordinator_impl.cpp 635
2023-05-30T09:46:36.458-0400 I -        [replication-404]

Target/Newly added node: MongoDB shell version v3.4.24
Source/existing node: MongoDB shell version v3.4.9

I tried to find rpm repo for 3.4.9 but could find only 3.4.24.

Size of the data on existing node : 30-35TB
No of databases : 32
No of collections : most of the DBs have less than 100 collections but 3-6 DBs have close to 5k-10k documents with more than few million documents in them.

Limitations:

  1. I can try to convince management to upgrade the mongo but that is long process to decide and implement
  2. we can not shutdown production server.

I am not expert in mongo and still trying to learn more and more from various sources so I would request the experts to help me achieve this target.

-Onkar

Hi @Onkarnath_Tiwary

Try increasing oplogInitialFindMax Seconds:
db.adminCommand( { setParameter: 1, oplogInitialFindMaxSeconds: 600 } )

Links for your version are in the archive

MongoDB 3.4 was End of life End of life the current supported versions are 4.4, 5.0 and 6.0 planning to upgrade to 4.4 at a minimum will bring you up to a version still receiving bugfixes and updates.

https://learn.mongodb.com/ has many free courses to upskill with MongoDB

2 Likes

Thank you for the respose Chris. One clarification, the command you suggested should be executed on primary I believe. Right?

I have taken mongoDB university courses but the experience comes only when you start working and that is what I am lacking at this point of time but thank you for the suggestion. I will keep doing that.

-Onkar

Might be a bit late for your situation by now.

I think this parameter should be set on the secondary that is doing the initial sync. You can set this in the configuration file or via the command line flag --setParameter so that is is applied when mongod starts.

Chris,

I was able to start mongo daemon using oplogInitialFindMaxSeconds but when i tried with initialSyncTransientErrorRetryPeriodSeconds , it got error. Anyways, I started mongod with oplogInitialFindMaxSeconds and reinitiated initial sync. Let see!! Will post the result in any case.

-Onkar

1 Like

The issue seems to have resolved now. Data is replicating without any error so far. Thank you for the help Chris

1 Like