Initial Sync taking extremely long

Gareth_Furnell · March 10, 2024, 9:21am

Hi,

I am trying to do an initial sync of 2 nodes that went down for a prolonged period due to disks being fixed/replaced - they are 7200k/rpm and I am doing 1 at a time - but even one is not completing in time and I have to reset the process because production traffic comes in and gets affected.

I started one - with the size of 1.2TB at 12:30pm and the initial sync only completed at 04:43, once that was complete it started reading the oplog from what it missed from when the initial sync started and at 08:00am it was still 13 hours behind.

Network has about 1ms max ping between the nodes. is there any suggestions to propose hat could hasten the process without impacting production workloads

Kindest regards
Gareth Furnell

chris · March 10, 2024, 2:23pm

Hi @Gareth_Furnell

Have you tried an initialSyncSourceReadPreference of secondaryPreferred or secondary?

Another option if you have filesystem snapshots available is seeding the data from a snapshot.
https://www.mongodb.com/docs/manual/tutorial/resync-replica-set-member/#sync-by-copying-data-files-from-another-member

1.2TB is this storage size or data size?

Gareth_Furnell · March 10, 2024, 4:04pm

Hi @chris thanks for getting back,

yes, one of my tries was syncing from a secondary and it took the same extremely long amount of time, I also had to ultimately stop the operation as reads from production were timing out on the node that it was syncing from.

we do not have filesystem snapshots available to quicken the process unfortunately - looking through the documentation provided, is:

" Applications can continue to modify data while mongodump captures the output. For replica sets, mongodump provides the --oplog option to include in its output oplog entries that occur during the mongodump operation. This allows the corresponding mongorestore operation to replay the captured oplog. To restore a backup created with --oplog, use mongorestore with the --oplogReplay option."

a decent option to try in this case?

1.2TB is the size of the /data directory of the cluster at the moment.

Fabio_Ramohitaj · March 10, 2024, 7:46pm

Hi @Gareth_Furnell ,
I would say that mongodump is quite an expensive operation in terms of workload for the server. You might consider increasing the size of the oplog, to try to be able to resync it via initial sync!
Personally with a colleague we had a node that we could not resynchronize and we stopped the service in production to perform an rsync on 13TB of data. The duration of this operation was just under 3 days. If you don’t need to have the data compacted with an initialsync and can give disservice for a few days, that would be the best solution.
Obviously what will affect this type of operation is network latency.

Best Regards

chris · March 10, 2024, 7:52pm

mongodump cannot be used to seed a secondary.

A file system snapsnapshot and copying the files to the target host should be faster. And with a good file sync program the copy should be able to be stopped and resumed if the IO or bandwidth requirements impact the production load.

If an underlying storage does not support it consider using a LVM on the host.

I have another method but I don’t have a full writeup for it. Nor is it fully tested and is definitely not official.

A couple of prerequisites:

Datapath must be on xfs. It should be per Production Notes.
XFS volume must have reflinks enabled. xfs_info can show this(reflink=1).
Oplog mush have enough headroom to allow for the duration of the copy to the taget host.
XFS volume must have enough free space. 2x Oplog size plus some margin.

This method used XFS reflink functionality to create a copy of all the files similar to a LVM snapshot.

Stop a secondary or Quiesce a secondary by using db.fsynlock().
Copy all files in data directory of this secondary using cp option --reflink=always to another directory(sync directory) on the same filesystem.
Start the secondary or db.fsyncunlock().
Copy the file from the sync directory to the member that need seeding.
Delete the files from the sync directory.

Gareth_Furnell · March 11, 2024, 11:56am

Starting on the process today - increasing the oplog size to accommodate the data copy size time and oplog catchup time.

once that is complete - I am testing the LVM snapshots on the node that is down with the old data, once I have the hang of that I will deploy the process to a productions secondary and copy it over to the node to be synced, once complete - bring up the secondary and wait for it to resync the data from the oplog.

If there is any more resources of LVM snapshots or this process it would be greatly appreciated - thanks so much for the comprehensive help @chris