Extreme degradation in write performance during newly added node replication


We run a fairly large MongoDB installation and we’ve recently come across something i’ve not seen happen before - the write performance of a cluster tanking completely upon adding a new node to a replica set.
We use backup disks to initialise the data directory on new nodes ie. it only replicates the last few days worth of data.

Usually, we add the node as a hidden member with priority 0 until it catches up and then we reconfigure it to a normal secondary. However, for whatever reason when we attempt this procedure in one of our clusters all writes basically freeze up and looking at the resource utilisation the primary is maxed out from COLLSCAN-queries towards oplog.rs.
Removing the new node from the set immediately resolves the issue.

What we have tried without success so far:

  1. Increasing hardware (CPU/RAM) for the new node and the primary node
  2. Adding the new node with votes:0 to avoid potential issues caused by write concern (new node having to ack writes, though we exclusively use a writeConcern of 1 in our system)
  3. Significantly reducing the size of the oplog in the cluster

The cluster is currently running 5 nodes and there are no issues with any of the other nodes and/or performance in general - the cluster is deliberately quite oversized and so steady state CPU load is around 10%.

Does anyone have any ideas as to what could be happening?

i don’t think this is a good idea. (of course, depend on how big it is now).

If you have a lot of data in the past few days, it will definitely require a big scan over ops log entries and thus can cause high disk IO.

So what you can try is .

  1. use a more powerful disk :slight_smile: or,
  2. do not use primary node as the replication source, instead, use a secondary.

Generally primary node and secondary node has the same write traffic, but given you use write concern: 1 , only primary needs to ack the write. So using a secondary node will only slow down the replication for the sec node, but not impact write performance on primary.