Time taken for initial data sync in a sharded environment

Hi Team,

I have a test cluster with 1shard(3node Rep set), 3config and 1router. I have created ~7gb of total data(including indexes) on primary. Now i have tried to added a new node to the replicaset. I observed that the initial data sync took 25mins to complete. I have tested this with and without enabling compression multiple times and the time taken is similar. There is plenty of ram configured.

I feel that 25mins is too long for 7GB of data. if i have 50GB/100GB of data, the initial sync time would ~3hrs/6hrs respectively. Please suggest.

Thanks,
Saran

Hello @Sharan_Kumar, welcome to the MongoDB Community forum.

You can try copying the existing member’s data to the new member being added (before adding the member to the replica-set) - this can reduce the amount of time for the new member to sync. This is explained in Add Members to a Replica Set - Prepare the Data Directory.

See the the second bullet point:

Before adding a new member to an existing replica set, prepare the new member’s data directory using one of the following strategies:

  • Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member. …
  • Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set. Copying the data over may shorten the amount of time for the new member to become current.