If you host an Ops Manager resource in the same Kubernetes cluster as the Kubernetes Operator and have the Application Database (AppDB) deployed on selected member clusters in your multi-Kubernetes-cluster deployment, you can manually recover the Kubernetes Operator and Ops Manager in the event that the cluster fails.
To learn more about deploying Ops Manager on a central cluster and the Application Database across member clusters, see Using Ops Manager with Multi-Kubernetes-Cluster Deployments.
Before you can recover the Kubernetes Operator and Ops Manager, ensure that you meet the following requirements:
Configure backups for your Ops Manager and Application Database resources, including any ConfigMaps and secrets created by the Kubernetes Operator, to indicate the previous running state of Ops Manager. To learn more, see Backup.
The Application Database must have at least three healthy nodes remaining after failure of the Kubernetes Operator's cluster.
The healthy clusters in your multi-Kubernetes-cluster deployment must contain a sufficient number of members to elect a primary node. To learn more, see Application Database Architecture.
Because the Kubernetes Operator doesn't support forcing a replica set reconfiguration, the healthy Kubernetes clusters must contain a sufficient number of Application Database members to elect a primary node for this manual recovery process. A majority of the Application Database members must be available to elect a primary. To learn more, see Replica Set Deployment Architectures.
If possible, use an odd number of member Kubernetes clusters. Proper distribution of your Application Database members can help to maximize the likelihood that the remaining replica set members can form a majority during an outage. To learn more, see Replica Sets Distributed Across Two or More Data Centers.
Consider the following examples:
To recover the Kubernetes Operator and Ops Manager, restore the Ops Manager resource on a new Kubernetes cluster:
Follow the instructions to install the Kubernetes Operator in a new Kubernetes cluster.
If you plan to re-use a member cluster, ensure that the appropriate service account and role exist. These values can overlap and have different permissions between the central cluster and member cluster.
To see the appropriate role required for the Kubernetes Operator, refer to the sample in the public repository.
Copy the object specification for the failed Ops Manager resource and retrieve the following resources, replacing the placeholder text with your specific Ops Manager resource name and namespace.
Then, paste the specification that you copied into a new file and configure the new resource by using the preceding values. To learn more, see Deploy an Ops Manager Resource.
Use the following command to apply the updated resource:
kubectl apply \ --context "$MDB_CENTRAL_CLUSTER_FULL_NAME" \ --namespace "mongodb" -f https://raw.githubusercontent.com/mongodb/mongodb-enterprise-kubernetes/master/samples/ops-manager/ops-manager-external.yaml
To check the status of your Ops Manager resource, use the following command:
kubectl get om -o yaml -w
Once the central cluster reaches a
Running state, you can
re-scale the Application Database to your desired
distribution of member clusters.
To host your
MongoDB resource or
MongoDBMultiCluster resource on the new
Kubernetes Operator instance, apply the following resources to the
The ConfigMap used to create the initial project.
The secrets used in the previous Kubernetes Operator instance.
If you deployed a
MongoDB resource and not a
and wish to migrate the failed Kubernetes cluster's data
to the new cluster, you must complete the following additional steps:
Create a new
MongoDBresource on the new cluster.
Migrate the data to the new resource by Backing Up and Restoring the data in Ops Manager.
If you deployed a
MongoDBMultiCluster resource, you must re-scale the resource that you
applied on the new healthy clusters if the failed cluster contained any
Application Database nodes.