I’m running a fairly large, fairly old (v4.2) 3-node MongoDB replicaSet cluster. When I say large I mean 1.1TB of data and another 900GB for the indexes.
I’d like to add a 4th node for DR purposes (non-voting, priority: 0) , just to have the realtime data in an offsite location.
However, I fear that the initial sync of all this data would severely impact the performance of the current cluster.
What would be some recommendations on approaching this ?
Some of the recommendations I saw mention:
restoring from a backup first and have it just copy the remaining data. But I guess there are some requirements on the opLog history right ?
having it sync from a secondary
Any native way to throttle the sync ? I would rather the initial sync take longer rather than bombard my cluster performance.
The above two methods are both good choices, but a more flexible method for backup and recovery is to copy the data directory. The 2TB database should be completed soon and copied from the slave node.
Great discussion here—thanks for laying out the challenge so clearly with a 1.1 TB dataset and concerns about performance impacts during sync. I’ve also seen another topic in the MongoDB forums that digs deeper into the initial sync architecture, especially how the data-copy and oplog-pull phases work together might be helpful to understand how oplog size and timing affect sync behavior: More info on MongoDB initial sync architecture