Prepare for Cluster Maintenance
Cloud Manager performs a rolling restart when you perform maintenance on nodes in a cluster. Automation updates nodes in a cluster one-by-one until all nodes are updated to maintain cluster availability during a maintenance period.
Before you perform maintenance on your clusters, review the following considerations and take action, if necessary, to maintain cluster availability.
Note
To learn about how Automation performs maintenance on your clusters, see How does Cloud Manager perform maintenance on cluster nodes?.
oplog
Size
Each node in a cluster is restarted in standalone mode before maintenance starts. The node replays writes in the oplog to catch up to the other nodes when it is added back to the cluster after maintenance completes.
Make sure that the cluster's oplog is large enough to store all writes
that you application might make during the maintenance period. Use
the replication.oplogSizeMB
advanced deployment option
to adjust the oplog size.
Priority
All client connections to a primary node are dropped when maintenance starts on that node. Connections are re-established to the newly elected primary node.
You may prefer a node in a specific data center to become the new primary node. Edit the cluster's configuration and adjust the priority of each node to indicate your preferred primary node.
Fault Tolerance
Nodes undergoing maintenance don't provide failover support to the cluster. For three-member replica sets, if an additional node becomes unavailable while one node is undergoing maintenance, the cluster has lost the majority of nodes. The primary node loses this status and steps down to become a secondary node. A new primary can't be elected until a majority of the cluster's nodes are available.
For mission-critical applications with high uptime needs, consider converting a three-member replica set to a five-member replica set before performing maintenance to maintain cluster majority in case an additional cluster node becomes unavailable during a maintenance period.
Note
Five-member replica sets or larger are more resilient and less likely to experience loss of majority during maintenance periods.
A simpler but less resilient option to increase multiple failure tolerance is to add a temporary arbiter to a three-member replica set before you perform maintenance.
Unique Index Builds
Automation builds indexes on cluster nodes one at a time using identical
but independent commands. To ensure that writes respect the unique
quality of indexed fields in a unique indexe,
all writes to the collection on the cluster must stop before you build
the index.
You can't use Data Explorer or the Automation Config Resource in Cloud Manager to create unique indexes in a rolling fashion because these methods don't stop writes to the cluster.
If your use case requires you to build new unique indexes:
Stop all writes to the affected collection. For more information. see db.fsyncLock() in the MongoDB Manual.
See Build Indexes on Replica Sets in the MongoDB Manual to build the unique index in a rolling fashion.