We run a fairly large MongoDB installation and we’ve recently come across something i’ve not seen happen before - the write performance of a cluster tanking completely upon adding a new node to a replica set.
We use backup disks to initialise the data directory on new nodes ie. it only replicates the last few days worth of data.
Usually, we add the node as a hidden member with priority 0 until it catches up and then we reconfigure it to a normal secondary. However, for whatever reason when we attempt this procedure in one of our clusters all writes basically freeze up and looking at the resource utilisation the primary is maxed out from COLLSCAN-queries towards oplog.rs.
Removing the new node from the set immediately resolves the issue.
What we have tried without success so far:
- Increasing hardware (CPU/RAM) for the new node and the primary node
- Adding the new node with votes:0 to avoid potential issues caused by write concern (new node having to ack writes, though we exclusively use a writeConcern of 1 in our system)
- Significantly reducing the size of the oplog in the cluster
The cluster is currently running 5 nodes and there are no issues with any of the other nodes and/or performance in general - the cluster is deliberately quite oversized and so steady state CPU load is around 10%.
Does anyone have any ideas as to what could be happening?