Best way to perform system updates on replica set hosts

laser · October 6, 2022, 6:45pm

I’m trying to determine the best way to perform maintenance on the underlying system that my replica set members are running on. I’m defining “best” as minimal downtime while also minimizing unnecessary complications.

My current setup is four mongods total, three of which are active replica set members and one of which is a hidden secondary (votes 0, priority 0). All are running on EC2 instances. They are all managed by Cloud Manager.

In order to perform system upgrades to the operating system (Ubuntu 18.04), it will be necessary to reboot each instance. This is the procedure I have used in the past without issue:

Set the hidden secondary’s votes to 1.
Choose a secondary to upgrade first and set its votes and priority to 0.
Perform the necessary system updates, then reboot.
When the secondary is back online, set its votes and priority back to 1.
Do steps 2-4 for the other secondary.
Do steps 2-4 for the primary. The primary will step down when you set its votes and priority to 0.
Set the hidden secondary’s votes back to 0.
Perform the necessary system updates to the hidden secondary, then reboot it.

I am doing things this way in order to keep the number of voting nodes to an odd number (each replica set node has votes: 1, priority: 1, except the hidden secondary, which is normally votes: 0, priority: 0) during any time that a server will be offline.

My coworker feels that this procedure is overly complicated, and that we can just do the following:

Shut down a secondary in Cloud Manager.
Perform the necessary system updates, then reboot.
Start the secondary up again in Cloud Manager.
Perform steps 1-3 for each other secondary.
When all secondary hosts are updated, perform steps 1-4 on the primary.

I’m concerned with issues that may arise from the number of votes being an even number during the time when a replica set member is offline, but maybe I’m just being overly cautious.

What’s the best procedure here? If it isn’t one of the above options, how do you handle system updates with your replica set?

Stennie_X · October 7, 2022, 2:04am

Hi @laser,

Your coworker’s procedure is the recommended approach. Reconfiguring for maintenance is unnecessary as long as you ensure you always have a majority of voting members available. This general approach is described in https://www.mongodb.com/blog/post/your-ultimate-guide-to-rolling-upgrades.

If you have a MongoDB 4.4+ deployment, you may also want to look into adjusting Mirrored Reads to help reduce the performance impact of restarting the primary during planned maintenance. Mirrored reads pre-warm the caches of electable secondary replica set members by sending a configurable sample of supported query operations from the primary.

When a replica set member is offline, the number of configured voting members does not change. The motivation for having an odd number of voting members is to increase fault tolerance during periods where 1 (or possibly more) replica set member are not available due to maintenance, connectivity, or other scenarios.

Your reconfiguration from 3 voting members to 2 voting members for maintenance would be unnecessary as the majority votes required to elect or sustain a primary would be 2 in both cases.

Regards,
Stennie

system · October 12, 2022, 2:04am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.