Replica set member not syncing

Hi,
We have a 4.2.6 based replica set running with 3 members.
Upon trying to add a 4th member (freshly installed) it will never sync completely…
It will remain in STARTUP2 state, retrieving all the data from a secondary node ( db.adminCommand( { replSetGetStatus: 1 } ).initialSyncStatus.databases will show all collections are filled at some point ), but will keep restarting this process 10 times until aborting.
Connection between the nodes is 10 Gbs private, seems to do the job while syncing the data.
We have upped (read it could help) oplog size to 128GB, it did seem to help retrieving all the data but not finishing the sync phase.

Anyone with any hints we could have missed ?
Thanks a bundle.

Hi @Philippe_Longere

Often the log will show what the problem is when replication is not completing. And I would suggest this as your first step.

Usually rs.add() is the least complex way to add a new member, however you may be interested in this other method for syncing a new member.

1 Like

:wave:

Hi @Philippe_Longere and welcome to the forums.

In addition to the information that @chris has provided, I would recommend checking your oplog size to make sure that it covers enough time to make the initial sync and replay any new changes that were made during the time the sync occurred to the new member. It seems like the oplog might be rolling before the sync completes.

If the oplog is too small then you will not be able to complete the initial sync of data and you will get errors and the new member will stay in a STARTUP2 state as it never gets all the data.

Hi Chris, thanks for the input :slight_smile:

here’s a partial (it tried 10 times) of the InitialSyncStatus error according to rs.status() :
“initialSyncAttempts” : [
{
“durationMillis” : 964585,
“status” : “OplogStartMissing: error fetching oplog during initial sync :: caused by :: Our last optime fetched: { ts: Timestamp(1589378774, 1118), t: 12 }. source’s GTE: { ts: Timestamp(1589378778, 1501), t: 12 }”,
“syncSource” : “mongo-p2-priv:27017”
},

mongod log shows about the same :
Fatal assertion 40088 OplogStartMissing: error fetching oplog during initial sync :: caused by :: Our last optime fetched: { ts: Timestamp(1589387475, 1698), t: 12 }. source’s GTE: { ts: Timestamp(1589387476, 1907), t: 12 } at src/mongo/db/repl/replication_coordinator_impl.cpp 743

I have more than doubled the oplog size on the Secondary sync gets done from since this morning, to no avail…

We had a look at the snapshot solution in the documentation, but it seemed far fetched because we cannot stop our application from running …

Best,
Philippe

Hello Philippe,
I want to clarify two things.
a. usually you need to increase the oplog on all the available mongod members as initial sync can happen from any secondary or primary member in the replicaset.
b. for the snapshot solution, you can take a snapshot from the secondary as well, so you need not stop your application. In my experience, the initial sync process hardly works for large clusters with heavy writes.

Thanks
R

2 Likes

Hi Doug and Rohit, thank you both for your input (and welcome :)),

I now get it we have a oplog problem that we’ll try to tackle by increasing the size and reducing as much as possible the writing during the sync.

We are also interested in learning a bit more about the snapshot solution, because the mongo documentation is a bit “light” at least for us non experts on that matter.

I understand we should shutdown a secondary, then copy its data files over to the new member, that’s the easy part. One question that stands is “What files should be copied over from S1 to S2 ?”. In /var/lib/mongod we have collection-XXX.wt indeed-XXX.wt WiredTiger*, a journal folder and a few others ?

Thanks again for your help, very much appreciated.
cheers,
Philippe

Stopping a secondary and rsyncing the files being the more straight forward way to seed. In my experience this has not been faster than rs.add()

Snapshot tends to be a more advanced system topic than a mongodb one. Snapshots and may require reconfiguring/redeploying your server. Linux LVM, ZFS and storage applicances(netapp) provides methods to take a snapshot and then access the snapshot files. This allows for a no downtime copy of the files.

Cloud vendor snapshots work equally well.

But this seeding from datafiles method too, requires an adequately sized oplog to cover the duration of copy + catch up.

The entire contents of that data directory to the target data directory(clear it first)

1 Like

Hi again,
So we’ve been trying the sync for the last 3 hours, it looks like there is something wrong we cannot put our finger on …
Here’s what seem to work all right (in order) :

  • db.adminCommand( { replSetGetStatus: 1 } ).initialSyncStatus.databases goes from nil to 100% copied
  • db.oplog.rs.stats().size goes from 0 to full (51972612053)
  • db.currentOp().operationTime starts going up (looks like oplog is being used)

But operationTime goes up very very/too slowly (it is slower than real time, operationTime goes up something like 1 second every 2 seconds) and process gets aborted after a while (surely after some manager detects it’s way behind and will never catch up) …

CPU/Memory/drive speed cannot be an issue given the sizing of the host (which is dedicated to this task, and seems rather quiet during the process), so we’re wondering what could make the application of the oplog slower than expected ? Or how to investigate what is happening ?

Thanks guys.

You need to go back to your logs again. It could still be oplog related.

Reading your earlier post you said you increased oplog to 128GB. Now you are saying the it is full at ~48GB. You need to ensure the oplog on you new node(all nodes actually) is the same size too.

Out of interest what is the output of rs.printReplicationInfo()

Hey,
Indeed (we finally got the end of it this morning) it was still oplog related, there was a (mostly) debug information that was filled very often in an obscure collection, but it looks like that somehow invisible data was in fact filling up the oplog. My colleague found about it (for anyone that would get the same kind of issue) by reading the output of :

use local
db.oplog.rs.find().limit(20)

which gets some extract of the oplog that lead us to the culprit by actually seeing this data.

Thanks everyone for your input, it made us look in the right direction :).
Take care.
Phil.

2 Likes