Initial sync a new replica node - minimal impact

Hello community,

I’m running a fairly large, fairly old (v4.2) 3-node MongoDB replicaSet cluster. When I say large I mean 1.1TB of data and another 900GB for the indexes.

I’d like to add a 4th node for DR purposes (non-voting, priority: 0) , just to have the realtime data in an offsite location.

However, I fear that the initial sync of all this data would severely impact the performance of the current cluster.

What would be some recommendations on approaching this ?

Some of the recommendations I saw mention:

  • restoring from a backup first and have it just copy the remaining data. But I guess there are some requirements on the opLog history right ?
  • having it sync from a secondary

Any native way to throttle the sync ? I would rather the initial sync take longer rather than bombard my cluster performance.

Thanks a lot.

The above two methods are both good choices, but a more flexible method for backup and recovery is to copy the data directory. The 2TB database should be completed soon and copied from the slave node.

1 Like

Hey @dogs_Cute , thanks a lot for answering.

So I assume the breakdown is:

  • stop the mongod service on a slave node
  • copy the files to the “to-be” replica node’s data directory
  • start the original node backup up (and it should begin catching up)
  • start the new node up and add it to the replicaSet.

I assume that as long as the OpLog is fresh enough it should be able to catch up as well.

Did I get that right ?

Thanks,

Yes, you just need to find a time period with the least business visits.

Hey @dogs_Cute ,

Quick follow-up question , on the “copy the data directory” method since looking online I see mixed suggestions.

What specific files in the source data directory should be copied ?

All *.wt files (all collection & indexes, or only some collections). Also WiredTiger, WiredTiger.turtle, WiredTiger.wt, storage.bson, etc ?

Is there a specific rule or just copy everything ?

Thanks a lot.

Copy all the files in the data directory

Great discussion here—thanks for laying out the challenge so clearly with a 1.1 TB dataset and concerns about performance impacts during sync. I’ve also seen another topic in the MongoDB forums that digs deeper into the initial sync architecture, especially how the data-copy and oplog-pull phases work together might be helpful to understand how oplog size and timing affect sync behavior: More info on MongoDB initial sync architecture

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.