How does the maximum number of voting members (7 in a replica set) limit your planned deployment?
An arbiter is only needed if you have an insufficient number of voting members to ensure a quorum to sustain or elect a primary
Our prior setup was as follows: we have 3 main availability zones which are basically allocated hardware in different data centers, in which we locate all our services. And we’re trying to avoid using other availability zones (for reasons unrelated to this discussion)
We had a setup with 3 voting members in AZ1, 2 in AZ2, and 2 in AZ3, being 7 in total.
This setup worked fine for several years until the outage happened in AZ1 and one node in went under maintenance in AZ2. During that time election failed and all updates failed since the primary node could not be elected.
As of now the only way we see to fix this issue was as follows: remove votes from one of the members in AZ1 and allocate another arbiter in AZ4
But from the user perspective, this 7 voting member limit seems somewhat arbitrary and we would really like to avoid using AZ4.
To give an example, we’d rather have 9 nodes split evenly across 3 AZ, this would allow us to perform maintenance over one host even during AZ outage.