/ /

Back Up and Restore Ops Manager Using a Secondary Instance

You can deploy a second Ops Manager instance, called a secondary Ops Manager, to back up a primary Ops Manager and its backing databases. The secondary Ops Manager also serves as your recovery path if you lose the primary Ops Manager.

This pattern protects the operational data that Ops Manager stores in its application database and metadata stores. Use this guide to design, configure, and operate disaster recovery for Ops Manager itself.

This guide is for Ops Manager administrators who manage backup and disaster recovery and for teams who design high availability and disaster recovery topologies for Ops Manager.

How Secondary Ops Manager Backup Works

In this pattern, two Ops Manager instances have distinct responsibilities:

The primary Ops Manager manages your MongoDB deployments and their backups, as usual.
The secondary Ops Manager manages and backs up only the primary Ops Manager's backing databases. The secondary Ops Manager doesn't manage your application clusters.

The MongoDB Agent runs on each host of the primary Ops Manager's application database and registers with the secondary Ops Manager. The secondary Ops Manager takes continuous and point-in-time backups of those backing databases.

If you lose the primary Ops Manager, you restore its backing databases from the secondary Ops Manager and then start a new primary Ops Manager. The primary Ops Manager reconnects to the restored backing databases and resumes management of your MongoDB deployments.

When the MongoDB Agents reconnect after the restart, they report a configuration version newer than the restored database. The primary Ops Manager detects the mismatch, automatically enters Restoration Mode for the affected project, converges all agents on the restored configuration, and blocks deployment changes until reconciliation completes.

Architecture

The following table describes the components in this pattern and their responsibilities:

Component	Responsibility
Primary Ops Manager	Manages your MongoDB deployments and their backups. Stores its own operational data in its application database, snapshot metadata store, and oplog metadata store.
Secondary Ops Manager	Runs a Backup Daemon that writes to an S3-compatible storage blockstore for application database snapshots and oplog slices. Continuously backs up the primary Ops Manager's backing databases. Doesn't manage your application clusters.
Application database	Stores the primary Ops Manager's operational data, including project configuration, automation state, and backup metadata. You must back up the application database.
Snapshot and oplog metadata stores	Store the block and oplog indexes for the deployments that the primary Ops Manager backs up. Back up these stores as well.
MongoDB Agent	Runs on each backing database host and registers with the secondary Ops Manager to perform backups and restores.

The secondary Ops Manager stores the backups of the primary Ops Manager's backing databases in its own S3-compatible storage blockstore, separate from the primary Ops Manager's backup storage.

Deployment Variants

Deploy the secondary Ops Manager in a separate failure domain from the primary Ops Manager to prevent a single failure from affecting both instances. Common variants include:

Different Regions

Deploy the secondary Ops Manager in a different cloud region than the primary Ops Manager. This variant protects against the loss of a region.

Different Data Centers

Deploy the secondary Ops Manager in a different data center than the primary Ops Manager. This variant protects against the loss of a data center.

Separate Backup Network

Place the secondary Ops Manager on a separate network that is dedicated to backup traffic. This variant isolates backup traffic from your application network.

Important

Deploy the secondary Ops Manager in a separate failure domain, such as a different rack, availability zone, region, or network segment, from the primary Ops Manager. If both instances share a failure domain, a single failure can disrupt both the primary Ops Manager and its recovery path.

Supported Versions and Limitations

Before you use this pattern, review the following requirements and limitations.

Supported Versions

Both the primary and secondary Ops Manager instances must run Ops Manager 8.0.24 or later.

The secondary Ops Manager must run the same version as the primary Ops Manager or a later version. Don't run a secondary Ops Manager that is earlier than the primary Ops Manager.

Warning

Restore the application database to a primary Ops Manager that runs the same version as, or a later version than, the original primary Ops Manager that the snapshot was taken from. If the replacement binary is older than the application database's recorded version, Ops Manager refuses to start with a "Downgrades are not permitted" error.

Limitations

This pattern backs up the primary Ops Manager's backing databases. It doesn't back up arbitrary MongoDB clusters. The primary Ops Manager continues to manage backups for your MongoDB deployments.
Backing up and reconciling the snapshot metadata store and the oplog metadata store is a manual procedure. Ops Manager doesn't automatically select a restore point for these stores. As a result, backup metadata can be inconsistent after a restore, and some backups might be non-restorable. Ops Manager validates a snapshot before it restores and fails with an error rather than performing an unsafe restore.
Restoration Mode doesn't apply to externally managed deployments, such as deployments that a Kubernetes Operator manages. After you restore the application database, agents in these projects receive the restored configuration directly on their next poll and converge without entering Restoration Mode. No action is required for these projects.
A snapshot can become unrestorable if its data blocks are no longer in the snapshot store. Before a restore, the primary Ops Manager verifies that the snapshot's blocks exist. If blocks are missing, the restore fails with an error and leaves the replica set unmodified, instead of wiping it and failing partway through.
An untested restore is an operational risk. Validate the backup and restore path regularly. See the validation runbook in Restore Ops Manager from a Secondary Ops Manager.

Next Steps

To set up and operate this pattern, see the following pages:

Back

Deploy Highly Available Backups

Configure a Secondary Ops Manager