- Replication >
- Replica Set Data Synchronization
Replica Set Data Synchronization¶
On this page
In order to maintain up-to-date copies of the shared data set, secondary members of a replica set sync or replicate data from other members. MongoDB uses two forms of data synchronization: initial sync to populate new members with the full data set, and replication to apply ongoing changes to the entire data set.
Initial Sync¶
Initial sync copies all the data from one member of the replica set to another member. See Initial Sync Source Selection for more information on initial sync source selection criteria.
Starting in MongoDB 4.2.7, you can specify the preferred initial sync
source using the initialSyncSourceReadPreference
parameter.
This parameter can only be specified when starting the
mongod
.
Process¶
When you perform an initial sync, MongoDB:
Clones all databases except the local database. To clone, the
mongod
scans every collection in each source database and inserts all data into its own copies of these collections.Changed in version 3.4: Initial sync builds all collection indexes as the documents are copied for each collection. In earlier versions of MongoDB, only the
_id
indexes are built during this stage.Changed in version 3.4: Initial sync pulls newly added oplog records during the data copy. Ensure that the target member has enough disk space in the
local
database to temporarily store these oplog records for the duration of this data copy stage.Applies all changes to the data set. Using the oplog from the source, the
mongod
updates its data set to reflect the current state of the replica set.When the initial sync finishes, the member transitions from
STARTUP2
toSECONDARY
.
To perform an initial sync, see Resync a Member of a Replica Set.
Fault Tolerance¶
To recover from transient network or operation failures, initial sync has built-in retry logic.
Changed in version 3.4: MongoDB 3.4 improves the initial sync retry logic to be more resilient to intermittent failures on the network.
Initial Sync Source Selection¶
Initial sync source selection depends on the value of the
mongod
startup parameter
initialSyncSourceReadPreference
(new in 4.2.7):
- For
initialSyncSourceReadPreference
set toprimary
(default ifchaining
is disabled), select the primary as the sync source. If the primary is unavailable or unreachable, log an error and periodically check for primary availability. - For
initialSyncSourceReadPreference
set toprimaryPreferred
, attempt to select the primary as the sync source. If the primary is unavailable or unreachable, perform sync source selection from the remaining replica set members. - For
initialSyncSourceReadPreference
set tonearest
(default ifchaining
is enabled), perform sync source selection from the replica set members. - For all remaining supported read preference modes, perform sync source selection from the replica set members.
Members performing initial sync source selection make two passes through the list of all replica set members:
- Sync Source Selection (First Pass)
- Sync Source Selection (Second Pass)
The member applies the following criteria to each replica set member when making the first pass for selecting a initial sync source:
- The sync source must be in the
PRIMARY
orSECONDARY
replication state. - The sync source must be online and reachable.
- If
initialSyncSourceReadPreference
issecondary
orsecondaryPreferred
, the sync source must be a secondary. - The sync source must be
visible
. - The sync source must be within
30
seconds of the newest oplog entry on the primary. - If the member
builds indexes
, the sync source must build indexes. - If the member
votes
in replica set elections, the sync source must also vote. - If the member is not a
delayed member
, the sync source must not be delayed. - If the member is a
delayed member
, the sync source must have a shorter configured delay. - The sync source must be faster (i.e. lower latency) than the current best sync source.
If no candidate sync sources remain after the first pass, the member performs a second pass with relaxed criteria. See Sync Source Selection (Second Pass).
The member applies the following criteria to each replica set member when making the second pass for selecting a initial sync source:
- The sync source must be in the
PRIMARY
orSECONDARY
replication state. - The sync source must be online and reachable.
- If
initialSyncSourceReadPreference
issecondary
, the sync source must be a secondary. - If the member
builds indexes
, the sync source must build indexes. - The sync source must be faster (i.e. lower latency) than the current best sync source.
If the member cannot select an initial sync source after two passes, it
logs an error and waits 1
second before restarting the selection
process. The secondary mongod
can restart the initial
sync source selection process up to 10
times before exiting with an
error.
Replication¶
Secondary members replicate data continuously after the initial sync. Secondary members copy the oplog from their sync from source and apply these operations in an asynchronous process. [1]
Secondaries may automatically change their sync from source as needed based on changes in the ping time and state of other members’ replication. See Replication Sync Source Selection for more information on sync source selection criteria.
[1] | Starting in version 4.2, secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. These slow oplog messages:
The profiler does not capture slow oplog entries. |
Multithreaded Replication¶
MongoDB applies write operations in batches using multiple threads to improve concurrency. MongoDB groups batches by document ID (WiredTiger) and simultaneously applies each group of operations using a different thread. MongoDB always applies write operations to a given document in their original write order.
Read operations that
target secondaries and are
configured with a read concern level of
"local"
or "majority"
read from
a WiredTiger snapshot of the data if the read
takes place on a secondary where replication batches are being applied.
Reading from a snapshot guarantees a consistent view of the data, and allows the read to occur simultaneously with the ongoing replication without the need for a lock. As a result, secondary reads requiring these read concern levels no longer need to wait for replication batches to be applied, and can be handled as they are received.
Flow Control¶
Starting in MongoDB 4.2, administrators can limit the rate at which
the primary applies its writes with the goal of keeping the majority
committed
lag under
a configurable maximum value flowControlTargetLagSeconds
.
By default, flow control is enabled
.
Note
For flow control to engage, the replica set/sharded cluster must
have: featureCompatibilityVersion (FCV) of
4.2
and read concern majority enabled
. That is, enabled flow
control has no effect if FCV is not 4.2
or if read concern
majority is disabled.
For more information, see Flow Control.
Replication Sync Source Selection¶
Replication sync source selection depends on the replica set
chaining
setting:
- With chaining enabled (default), perform sync source selection from the replica set members.
- With chaining disabled, select the primary as the sync source. If the primary is unavailable or unreachable, log an error and periodically check for primary availability.
Members performing replication sync source selection make two passes through the list of all replica set members:
- Sync Source Selection (First Pass)
- Sync Source Selection (Second Pass)
The member applies the following criteria to each replica set member when making the first pass for selecting a replication sync source:
- The sync source must be in the
PRIMARY
orSECONDARY
replication state. - The sync source must be online and reachable.
- The sync source must have newer oplog entries than the member (i.e. the sync source is ahead of the member).
- The sync source must be
visible
. - The sync source must be within
30
seconds of the newest oplog entry on the primary. - If the member
builds indexes
, the sync source must build indexes. - If the member
votes
in replica set elections, the sync source must also vote. - If the member is not a
delayed member
, the sync source must not be delayed. - If the member is a
delayed member
, the sync source must have a shorter configured delay. - The sync source must be faster (i.e. lower latency) than the current best sync source.
If no candidate sync sources remain after the first pass, the member performs a second pass with relaxed criteria. See the Sync Source Selection (Second Pass).
The member applies the following criteria to each replica set member when making the second pass for selecting a replication sync source:
- The sync source must be in the
PRIMARY
orSECONDARY
replication state. - The sync source must be online and reachable.
- If the member
builds indexes
, the sync source must build indexes. - The sync source must be faster (i.e. lower latency) than the current best sync source.
If the member cannot select a sync source after two passes, it logs an
error and waits 1
second before restarting the selection process.
Note
Starting in MongoDB 4.2.7, the startup parameter
initialSyncSourceReadPreference
takes precedence over
the replica set’s settings.chainingAllowed
setting when
selecting an initial sync source. After a replica set member
successfully performs initial sync, it defers to the value of
chainingAllowed
when selecting a replication sync
source.
See Initial Sync Source Selection for more information on initial sync source selection.