Backup Alerts¶

On this page

Backup Agent Down
Backups Broken
Cluster Snapshot Failed
Bind Failure
Snapshot Behind Snitch

If a problem with the Ops Manager Backup system occurs, Ops Manager sends an alert to system administrators. This page describes possible alerts and provides steps to resolve them.

Backup Agent Down¶

This alert is triggered if a Backup Agent for a group with at least one active replica set or cluster is down for more than 1 hour.

To resolve this alert:

Open the group in Ops Manager by typing the group’s name in the GROUP box.
Select the Backup tab and then the Backup Agents page to see what server the Backup Agent is hosted on.
Check the Backup Agent log file on that server.

Backups Broken¶

If Ops Manager Backup detects an inconsistency, the Backup state for the replica set is marked as “broken.”

To debug the inconsistency:

Check the corresponding Backup Agent log. If you see a “Failed Common Points” test, one of the following may have happened.
- A significant rollback event occurred on the Backup replica set.
- The oplog for the Backup replica set was resized or deleted.
- High oplog churn caused the agent to lose the tail of the oplog.
In such cases, you must resync the Backup replica set, as described in the procedure Resync Backup.
Check the corresponding job log for an error message explaining the problem. In Ops Manager, click Admin, then Backup, and then Jobs. Then click the name of the job and then Logs. Contact MongoDB Support if you need help interpreting the error message.

Cluster Snapshot Failed¶

This alert is generated if Ops Manager Backup cannot successfully take a snapshot for a sharded cluster backup. The alert text should contain the reason for the problem. Common problems include the following:

There was no reachable mongos. To resolve this issue, ensure that there is at least one mongos showing on the Ops Manager Deployment page.
The balancer could not be stopped. To resolve this issue, check the log files for the first config server to determine why the balancer will not stop.
Could not insert a token in one or more shards. To resolve this issue, ensure connectivity between the Backup Agent and all shards.

Bind Failure¶

This alert is generated if a new replica set cannot be bound to a Backup Daemon. The alert test should contain a reason for the problem. Common problems include:

No primary is found. At the time the binding occurred, no primary could be detected by the Monitoring Agent. Ensure that the replica set is healthy.
Not enough space is available on any Backup Daemon.

In both cases, resolve the issue and then re-initiate the initial sync. Alternatively, the job can be manually bound through the Ops Manager Admin interface. In Ops Manager, click Admin, then Backup, and then Job Timeline.

For information on initial sync, see Replica Set Data Synchronization.

Snapshot Behind Snitch¶

This alert is triggered if the latest snapshot for a replica set is significantly behind schedule. Check the job log in the Ops Manager Admin interface for any obvious errors. In Ops Manager, click Admin, then Backup, and then Jobs. Then click the name of the job and then Logs.

← Configure Available MongoDB Versions Start and Stop Ops Manager Application →