I have a mongo cluster that has the same hardware across 3 servers. It is currently set to Master, Secondary Secondary, and the secondaries do not have any write rules. I do not have read’s set to the secondaries either, they are just currently setup for a DR type scenario. The Master and one secondary are in the same Datacenter, the other secondary is in another Datacenter.
Every day generally around 6:30 AM the secondaries fall behind and replication lag starts to increase. The only method of correcting this is to restart mongod’s process and it quickly catches back up. I’ve gone through many of the Production best practices. Most were found with this guide: MongoDB Best Practices 2020 Edition - Percona Database Performance Blog
The Server has 2 CPUS, each with 28 Processors. They have 256 GB of RAM (200 GB is allocated to WireTiger Cache), and has 26 enterprise SSD’s setup in a RAID 5 ( I’ve learned that RAID 5 is a red flag). Can someone help me understand how to figure out if is happening due to writes.
Any help or tips is really appreciated.