Troubleshoot Replication Lag

Replication lag makes "lagged" members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.

This page contains several tips that can reduce replication lag, however, many cases may require escalation. If you cannot determine the cause of your replication lag or need additional support, contact Technical Support.

Prerequisite Checks

To check the current length of replication lag on your deployment:

In a mongosh session that is connected to the primary, call the rs.printSecondaryReplicationInfo() method to display the current lag on each secondary relative to the primary host.

Returns the syncedTo value for each member, which shows the time when the last oplog entry was written to the secondary, as shown in the following example:

source: m1.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    7230 secs (2 hrs) behind the primary
source: m2.example.net:27017
    syncedTo: Thu Apr 10 2014 10:27:47 GMT-0400 (EDT)
    0 secs (0 hrs) behind the primary

The number of seconds behind the primary tells you how far behind the secondary lags behind the primary.

A delayed member may show as 0 seconds behind the primary when the inactivity period on the primary is greater than the members[n].secondaryDelaySecs value.

In Atlas deployments, monitor the rate of replication in Atlas deployments by checking for non-zero or increasing oplog time values in the Replication Lag graph available in Cloud Manager and in Ops Manager.
Additionally, you can monitor replication lag in Atlas by checking the Replication Lag, Oplog GB/Hour, and Replication Oplog Window in the metrics tab of your cluster. For more information, see Review Available Metrics.

Common Issues and Resolutions

There is no error code for replication lag, and no immediate way to determine the cause. However, before you escalate to support, check if the following issues might be causing your lag:

Network Latency

Replication lag can build up when cluster nodes cannot reliably communicate with each other.

Check the network routes between the members of your replica set to ensure that there is no packet loss or network routing issues.

Use tools including ping to test latency between set members and traceroute to expose the routing of packets between network endpoints.

Alternatively, run replSetGetStatus and examine the pingMs field. This returns the current network latency in milliseconds between the primary and secondary nodes.

Secondary Resource Exhaustion

Secondary nodes can experience resource contention when they cannot efficiently handle incoming read operations from the primary. This can cause memory issues such as cache contention. When the cache reaches critical thresholds, the server uses application threads to evict pages, which reduces the number of threads available to manage replication.

To see whether the server redirects application threads to eviction tasks, run the following command in your mongosh shell:

db.serverStatus().wiredTiger.cache['pages evicted by application threads']

If Result = 0: No application threads are paused for eviction.
If Result > 0 (and increasing): The database is under cache pressure. Incoming queries must evict cached data before they execute, which can increase replication lag.

Atlas users can also monitor the WiredTiger Cache Activity and Page Faults metrics to investigate cache-related issues.

For potential strategies to fix this issue, see Scale your resources .

Disk-Related Issues

If a secondary node cannot flush dirty data to disk quickly enough, it falls behind the primary. This phenomenon occurs when the volume of writes coming from the primary exceeds the write speed of the secondary's disk.

Disk-related issues are prevalent on multi-tenant systems, including virtualized instances, and can be transient if the system accesses disk devices over an IP network.

To assess disk status, use system-level tools, such as iostat or vmstat.

Atlas users can access Atlas metrics such as Disk IOPS and Disk Space Used to investigate disk issues.

Some common causes of disk issues include:

Under provisioning: The secondary has slower disks, or lower IOPS, than the primary.
Virtualization Overhead: In shared environments, other virtual machines may saturate the physical disk controller.

Some possible solutions to disk issues include:

Increasing provisioned IOPS
Upgrading to NVMe Storage
Upgrading to a higher Atlas cluster tier. For more information, see Atlas Cluster Sizing and Tier Selection.

Long Running Operations

In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries. This prevents write operations from returning if replication cannot keep up with the write load.

You can also use the database profiler to identify slow queries or long-running operations that correlate with the observed lag.

Excessive Write Load

Bulk write operations can exceed replica sets' ability to replicate in a timely manner, causing replication lag.

The following sub-sections offer possible solutions to this issue:

Use Smaller Batches

Control the load by batching and filtering CRUD commands.

Run each batch against a date or time range such as a month, week, or day. Ensure the query filters use an index to avoid collection scans. Collection scans can evict data and index pages from the working set and increase replication lag.

Start by deleting small date ranges. If those operations complete in seconds, increase the batch size. Monitor replication lag with rs.printSecondaryReplicationInfo(). Increase the batch size until you reach a balance between throughput and replication lag. Continue to monitor system load, the impact on other users and applications, and how far the secondaries lag.

For example:

db.collName.deleteMany({createdDate: {$gte: new Date("2018-12-01"), $lt: new Date("2019-01-01")}});
db.collName.deleteMany({createdDate: {$gte: new Date("2018-11-01"), $lt: new Date("2018-12-01")}});
db.collName.deleteMany({createdDate: {$gte: new Date("2018-10-01"), $lt: new Date("2018-11-01")}});

Configure Server-Side Settings and Parameters

MongoDB provides the following server-side settings and parameters that can control resource usage during write-intensive operations:

storageEngineConcurrentWriteTransactions: Lowering this value can reduce contention caused by bulk deletes when other write operations occur simultaneously.
Note
Take caution when modifying storageEngineConcurrentWriteTransactions, as changing the setting can lead to performance issues or errors. We recommend you consult with MongoDB Support before changing the parameter.
maxTimeMS: If the bulk write operation is complex, you can limit its execution time to prevent long-running operations that affect server performance. Some examples of complex operations include matching many documents or querying by non-indexed fields.

Delete Documents in Indexed Order

If the field you run your bulk operations on is not indexed, the bulk operation can cause collection or table scans, increasing resource usage. Ensure an index exists on the field used in the query filter for faster deletions, reducing locking contention and improving performance.

Create an index before running the operation:

db.collection.createIndex({ status: 1 });

Then delete based on the indexed field:

db.collection.deleteMany({ status: "inactive" });

oplog Window Size

If your oplog window is too small for the amount of data you are syncing, you might experience replication lag. A larger oplog can give a replica set a greater tolerance for lag.

To check the size of the oplog and the date ranges of its operations for a given replica set member, connect to the member in mongosh and run the rs.printReplicationInfo() method. The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. [1] At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week's work of operations.

Note

You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.

To change oplog size, see the Change the Oplog Size of Self-Managed Replica Set Members tutorial.

[1]	The oplog can grow past its configured size limit to avoid deleting the `majority commit point`.

Verify Resolution

To confirm that the issue is resolved, call the rs.printSecondaryReplicationInfo() method and check that there are no longer any lagging members.

Diagnostics to Collect for More Support

If none of the above solutions reduce your lag, contact support. Support may ask for diagnostics to further diagnose your problem.

Some helpful diagnostics for Atlas users to collect for support include:

Your rs.printSecondaryReplicationInfo() output
The timeline of when the lag started
Any recent changes to your deployment, such as changes to your schema, indexes, application, tier, or hardware.

Related Issues

Back

No Replica Set Primary

Manage Sharded Cluster Health with Health Managers