- Deploy MongoDB Resources on Multiple Kubernetes Clusters (Beta) >
- Disaster Recovery
Disaster Recovery¶
On this page
The Kubernetes Operator can orchestrate the recovery of MongoDB replica set members to a healthy Kubernetes cluster when the Kubernetes Operator identifies that the original Kubernetes cluster is down.
Disaster Recovery Modes¶
The Kubernetes Operator can orchestrate either an automatic or manual remediation
of the MongoDBMulti
resources in a disaster recovery scenario, using
one of the following modes:
Auto Failover Mode allows the Kubernetes Operator to shift the affected MongoDB replica set members from an unhealthy Kubernetes cluster to healthy Kubernetes clusters. When the Kubernetes Operator performs this auto remediation, it evenly distributes replica set members across the healthy Kubernetes clusters.
To enable this mode, use
--set multiCluster.performFailover=true
in the MongoDB Helm Charts for Kubernetes. In thevalues.yaml
file in the MongoDB Helm Charts for Kubernetes directory, the environment’s variable default value istrue
.Alternatively, you can set the multi-Kubernetes-cluster deployment environment variable
PERFORM_FAILOVER
totrue
, as in the following abbreviated example:Manual(CLI-based) Failover Mode allows you to use the multi-cluster CLI to reconfigure the Kubernetes Operator to use new healthy Kubernetes clusters. In this mode, you distribute replica set members across the new healthy clusters by configuring the
MongoDBMulti
custom resource based on your configuration.To enable this mode, use
--set multiCluster.performFailover=true
in the MongoDB Helm Charts for Kubernetes, or set the multi-Kubernetes-cluster deployment environment variablePERFORM_FAILOVER
tofalse
, as in the following abbreviated example:
Note
You can’t rely on the auto or manual failover modes when a Kubernetes cluster hosting one or more Kubernetes Operator instances goes down, or the replica set member resides on the same failed Kubernetes cluster as the Kubernetes that manages it.
In such cases, to restore replica set members from lost Kubernetes clusters to the remaining healthy Kubernetes clusters, you must first restore the Kubernetes Operator instance that manages your multi-Kubernetes-cluster deployments, or redeploy the Kubernetes Operator to one of the remaining Kubernetes clusters, and rerun the multi-cluster CLI. To learn more, see Manually Recover from a Failure Using the Multi-Cluster CLI.
Manually Recover from a Failure Using the Multi-Cluster CLI¶
When a Kubernetes cluster hosting one or more Kubernetes Operator instances goes down, or the replica set member resides on the same failed Kubernetes cluster as the Kubernetes that manages it, you can’t rely on the auto or manual failover modes and must use the following procedure to manually recover from a failed Kubernetes cluster.
The following procedure uses the multi-cluster CLI to:
- Configure new healthy Kubernetes clusters.
- Re-balance nodes hosting
MongoDBMulti
resources on the nodes in the healthy Kubernetes clusters.
Before you start the following procedure, ensure that you:
Deployed one central cluster and three member clusters, following the Quick Start Procedure. In this case, the Kubernetes Operator is installed with the automated failover disabled with
--set multiCluster.performFailover=false
.Deployed a
MongoDBMulti
resource as follows:
The Kubernetes Operator periodically checks for connectivity to the clusters
in the multi-Kubernetes-cluster deployment by pinging the /healthz
endpoints of the
corresponding servers. To learn more about /healthz
, see Kubernetes API health endpoints.
In the case that CLUSTER_3
in our example becomes unavailable, the
Kubernetes Operator detects the failed connections to the cluster and marks
the MongoDBMulti
resources with the failedClusters
annotation for
subsequent reconciliations.
The resources with data nodes deployed on this cluster fail reconciliation until you run the manual recovery steps as in the following procedure.
To re-balance the MongoDB data nodes so that all the workloads run on
CLUSTER_1
and CLUSTER_2
:
Recover the multi-Kubernetes-cluster deployment using the multi-cluster-CLI as follows:
This command:
- Reconfigures the Kubernetes Operator to manage workloads on the two healthy clusters. (This list could also include new clusters).
- Marks
CLUSTER_1
as the source of configuration for the member node configuration for new healthy Kubernetes clusters. This means that Roles and Service Account configuration is replicated to match the configuration inCLUSTER_1
.
Re-configure the MongoDBMulti resource to re-balance the data nodes on the healthy Kubernetes clusters by editing the MongoDB resources affected by the change:
Manually Recover from a Failure Using GitOps Workflows¶
For an example of using the multi-cluster-CLI in a GitOps workflow with Argo CD, see multi-cluster CLI example for GitOps.