Hi,
We recently had a problem where after performing a resize operation on a EBS volume, the volume completely stopped responding to MongoDB queries for some minutes. We recovered from this state by forcing a restart on the MongoDB primary host, which triggered the failover to a secondary. We did try to execute a stepdown on the primary, but it did not have any effect, which forced us to move to the restart the server option.
There was no automatic failover (i.e. the primary stepping down on its own) because even though the data volume was not responding, the mongo process was still up and running and responding to health checks from the secondaries.
So, to summarise, the volume was not responding, no query was being successfully executed, the CPU on the host was showing more than 50% in io-wait, and the manual stepdown did not work, only the host restart.
While this of course is a failure in the underlying hardware, is there a way to configure Mongo to failover in case the data volume shows this type of behaviour/failures?
Thanks