Hello everyone!
Recently, we’ve been experiencing issues with one of our MongoDB instances that do not occur in others where we have similar tasks.
In all cases, we have a PSA (Primary - Secondary - Arbiter) architecture, running MongoDB 6.0.
We have common administrative tasks, such as backups, where we use fsyncLock, and other operations where the read node is temporarily shut down (for about 5-10 minutes).
During this period, flow control kicks in and starts delaying write requests—queries that usually take 50ms begin taking at least 700ms, and in some cases, exceed 210,000ms.
We understand how flow control works, but in our case, it doesn’t seem to make sense since everything is fine in the network, and the node is simply down.
In our tests, the impact of flow control is quite evident. Disabling flow control for routine tasks seems like a viable option for us, but we could face other situations, such as a cloud failure or other incidents where a read node becomes unavailable. This would cause a total failure of that read node, which is not a problem for us, but it would degrade writes on the primary, which does become a concern.
We would like to hear more opinions—does it make sense to disable flow control, or is there a safer option we are unaware of? Adding more read replicas does not solve the flow control issue.
Flow control is a protection mechanism, so disabling a safeguard seems risky to us.
We appreciate your support.