Backup Preparations¶

On this page

Backup Configuration Options
Backup Sizing Recommendation
Snapshot Frequency and Retention Policy
Backup Considerations

Before backing up your cluster or replica set, decide how to back up the data and what data to back up. This page describes items you must consider before starting a backup.

Backup Configuration Options¶

The backup and recovery requirements of a given system vary to meet the cost, performance and data protection standards the system’s owner sets.

Ops Manager Enterprise Backup and Recovery supports five backup architectures, each with its own strengths and trade-offs. Consider which architecture meets the data protection requirements for your deployment before configuring and deploying your backup architecture.

Example

Consider a system whose requirements include low operational costs. The system’s owners may have strict limits on what they can spend on storage for their backup and recovery configuration. They may accept a longer recovery time as a result.

Conversely, consider a system whose requirements include a low Recovery Time Objective. The system’s owners tolerate greater storage costs if it results in a backup and recovery configuration that fulfills the recovery requirements.

Ops Manager Enterprise Backup and recovery supports the following backup architectures:

A File System on a Sophisticated SAN
A File System on one or more NAS devices
An AWS S3 Blockstore
MongoDB Blockstore in a Highly Available configuration
MongoDB Blockstore in a Standalone configuration

Important

The backup architecture features and concerns are provided as guidance for developing your own data protection requirements. They do not cover every scenario nor are they representative of every deployment.

Backup Method Features¶

Backup System Feature	File System on SAN	File System on NAS	AWS S3 Blockstore	MongoDB HA Blockstore	MongoDB Blockstore
Snapshot Types	Complete [1]	Complete [1]	Many partial	Many partial	Many partial
Backup Data Deduplication	If SAN supports	No	Yes	Yes	Yes
Backup Data Compression	Yes	Depends	Yes	Yes	Yes
Backup Data Replication	If SAN supports	No	No	Yes	No
Backup Storage Cost	Higher	Medium	Lower	Higher	Lower
Staff Time to Manage Backups	Medium	Medium	Lower	Higher	Medium
Backup RTO	Lower	Medium	Lower	Lower	Medium

[1]

(1, 2)

Every snapshot includes the full database path. The MongoDB Agent transfers the blocks for each incremental change to the filesystem store. MongoDB Agent writes the bytes to the correct range of blocks that exist in the new snapshot. Ops Manager then copies the blocks of the previous full snapshot into those offsets. This results in a complete backup. This saves network bandwidth.

Example

Last full snapshot	New incremental snapshot
****** **** ******	********

To create a new full snapshot, Ops Manager:

1	Determines that the bytes of the new incremental snapshot fit into the old snapshot in these locations:	***... ****** .*** **....
2	Writes the incremental snapshot to free spaces in the filesystem to fit those offsets:	....**. ........ ....... ....****
3	Copies the old snapshot into the offsets of the new incremental snapshot, resulting in this byte map:	****** **** **** ******

Note

When Do You Use a Particular Backup Method?

If you do not want to maintain separate backup systems nor do you want your staff to maintain them, consider backing up to a MongoDB or |s3| snapshot store.
If you need to restore data without relying on MongoDB database, consider backing up to a file system on a SAN or NAS device or an |s3| snapshot store.
If you are backing up large amounts of data or frequently need to restore data, consider either a file system on a SAN, |s3| snapshot store or a MongoDB blockstore configured as a replica set or sharded cluster.
If you want to minimize internal storage and maintenance costs, consider backing up to one of the following options:
- An |s3| snapshot store or
- A MongoDB standalone blockstore.
A MongoDB standalone blockstore offers limited resilience. If the disk fills, this blockstore may go offline. You can recover snapshots only after adding additional storage.
If you have a SAN with advanced features like high availability, compression, deduplication, etc., consider using that SAN for file system backups.

Backup Sizing Recommendation¶

The backup recommendation depends on the Feature Compatibility Version of the database that you want to back up.

FCV of 4.0 or earlier
FCV of 4.2 or later

When sizing the backup of your data, keep the replica set size to 2 TB or less of uncompressed data. If your database increases beyond 2 TB of uncompressed data:

Shard the database
Keep each shard to 2 TB or less of uncompressed data

These size recommendations are a best practice. They are not a limitation of the MongoDB database or Ops Manager.

Backup and restore can use large amounts of CPU, memory, storage, and network bandwidth.

Example

Your stated network throughput, such as 10 Gbps or 100 Gbps, is a theoretical maximum. That value doesn’t account for sharing or throttling of network traffic.

Consider the following scenario:

You want to back up a 2 TB database.
Your hosts support a 10 Gbps TCP connection from Ops Manager to its backup storage.
The network connection has very low packet loss and a low round trip delay time.

A full backup of your data would take more than 30 hours to complete. [*]

This doesn’t account for disk read and write speeds, which can be, at most, 3 Gbps reads and 1 Gbps writes for a single or mirrored NVMe storage device.

The time required to complete each successive incremental backup depends on write load.

If you shard this database into 4 shards, each shard runs its backup separately. This results in a backup that takes less than 8 hours to complete.

[*]	These throughput figures were calculated using the Network Throughput Calculator and assume no additional network compression.

When sizing the backup of your data, keep the replica set size to 2 TB or less of compressed data. If your database increases beyond 2 TB of compressed data:

Shard the database
Keep each shard to 2 TB or less of compressed data

These size recommendations are a best practice. They are not a limitation of the MongoDB database or Ops Manager.

Backup and restore can use large amounts of CPU, memory, storage, and network bandwidth.

Example

Your stated network throughput, such as 10 Gbps or 100 Gbps, is a theoretical maximum. That value doesn’t account for sharing or throttling of network traffic.

Consider the following scenario:

You want to back up a 2 TB database.
Your hosts support a 10 Gbps TCP connection from Ops Manager to its backup storage.
The network connection has very low packet loss and a low round trip delay time.

A full backup of your data would take more than 30 hours to complete. [†]

This doesn’t account for disk read and write speeds, which can be, at most, 3 Gbps reads and 1 Gbps writes for a single or mirrored NVMe storage device.

The time required to complete each successive incremental backup depends on write load.

If you shard this database into 4 shards, each shard runs its backup separately. This results in a backup that takes less than 8 hours to complete.

[†]	These throughput figures were calculated using the Network Throughput Calculator and assume no additional network compression.

Snapshot Frequency and Retention Policy¶

By default, Ops Manager takes a base snapshot of your data every 24 hours.

If desired, administrators can change the frequency of base snapshots to 6, 8, 12, or 24 hours. Ops Manager creates snapshots automatically on a schedule; you cannot take snapshots on demand.

Ops Manager retains snapshots for the time periods listed in the following table.

If you terminate a deployment’s backup, Ops Manager immediately deletes the snapshots that are within the dates of the current retention policy.

If you stop a deployment’s backup, Ops Manager stops taking new snapshots but retains existing snapshots until their listed expiration date.

Snapshot	Default Retention Policy	Maximum Retention Policy
Base snapshot	2 days	5 days (30 days if frequency is 24 hours)
Daily snapshot	0 days	1 year
Weekly snapshot	2 weeks	1 year
Monthly snapshot	1 month	7 years

You can change a backed-up deployment’s schedule through its Edit Snapshot Schedule menu option, available through the Backup page. Administrators can change snapshot frequency and retention through the snapshotSchedule resource in the API.

Changing the reference time changes the time of the next scheduled snapshot:

If the new reference time is before the current reference time, the next snapshot occurs at the new reference time tomorrow. See the first two rows of the table below for examples.
If the new reference time is after the current reference time, and you make the change before the current reference time, the next snapshot occurs at the new reference time today. See the third row of the table below for an example.
If the new reference time is after the current reference time, but you make the change after the current reference time, the next snapshot occurs at the new reference time tomorrow. See the fourth row of the table below for an example.

Time of Change	Current Reference Time	New Reference Time	Time of Next Snapshot
08:00 UTC	12:00 UTC	10:00 UTC	10:00 UTC tomorrow
13:00 UTC	12:00 UTC	10:00 UTC	10:00 UTC tomorrow
08:00 UTC	12:00 UTC	14:00 UTC	14:00 UTC today
13:00 UTC	12:00 UTC	14:00 UTC	14:00 UTC tomorrow

If you change the schedule to save fewer snapshots, Ops Manager does not delete existing snapshots to conform to the new schedule. To delete unneeded snapshots, see Delete a Snapshot.

Limits¶

Ops Manager does not backup deployments where the total number of collections on the deployment meets or exceeds 100,000.
Ops Manager does not replicate index collection options.

Encryption¶

Ops Manager can encrypt any backup job stored in a head database running MongoDB Enterprise between FCV 3.4 and 4.0 with the WiredTiger storage engine.

Backup Considerations¶

Databases Running FCV 4.2¶

Backup support for MongoDB 4.2 with "featureCompatibilityVersion" : 4.2 is currently limited. Support will be extended in future releases of Ops Manager.

Backup Features Supported at Present¶

Feature	MongoDB 4.2 with FCV : 4.2	MongoDB 4.2 with FCV : 4.0	MongoDB 4.0 or earlier
Backs up Data using WiredTiger Snapshots	check circle icon
Backs up Data using the Backup Daemon		check circle icon	check circle icon
Backs up Replica Sets	check circle icon	check circle icon	check circle icon
Backs up Sharded Clusters	check circle icon	check circle icon	check circle icon
Can Filter using Namespaces		check circle icon	check circle icon
Can Specify Sync Source Database		check circle icon	check circle icon
Can Restore Data to Specific Point in Time	check circle icon	check circle icon	check circle icon
Can Perform Incremental Backups [‡]	check circle icon	check circle icon	check circle icon
Supports Snapshots that use Encryption	check circle icon [§]	check circle icon	check circle icon
Supports Saving to Blockstore Snapshot Storage	check circle icon	check circle icon	check circle icon
Supports Saving to S3 Snapshot Storage	check circle icon	check circle icon	check circle icon
Supports Saving to File System Storage		check circle icon	check circle icon
Supports Databases running MongoDB Enterprise	check circle icon	check circle icon	check circle icon
Supports Databases running MongoDB Community		check circle icon	check circle icon
Requires a MongoDB Agent with backup enabled on every `mongod` cluster node	check circle icon

[‡]	Ops Manager requires a full backup for your first backup, after a snapshot has been deleted, and if the blockstore block size has been changed. Incremental backups reduce network transfer and storage costs. This feature works with MongoDB 4.2.6 or later.

[§]	Ops Manager supports encrypted snapshots as of version 4.2.16. Querying an encrypted snapshot requires MongoDB Enterprise 4.2.9 or 4.4.0.

Requirements and Limitations¶

To run backups and restores if you are running MongoDB 4.2 with "featureCompatibilityVersion" : 4.2, you:

Must run MongoDB Enterprise.
Cannot use namespace filter lists to define the namespaces included in a backup. Snapshots using FCV 4.2 always include all namespaces.
Cannot specify a sync source database. For FCV 4.2 replica sets, no Initial Sync step is required. When taking a Snapshot, Ops Manager selects the replica set member with the least performance impact and greatest storage-level duplication of Snapshot data.
Cannot save your backup to a file system store. Backup supports MongoDB and S3 Snapshot Storage.

Must deploy a MongoDB Agent with every mongod node in the cluster.

Note

If Ops Manager doesn’t manage your cluster:

Grant the backup permission to the MongoDB user that runs backups.
Ensure that the operating system user that runs the MongoDB Agent has read permission for all data files (including journal files) of the deployment.

Databases not Running FCV 4.2¶

Important

Only sharded clusters or replica sets can be backed up. To back up a standalone mongod process, you must convert it to a single-member replica set.

The following considerations apply when your databases run any version of MongoDB 4.0 or earlier or when they run MongoDB 4.2 with "featureCompatibilityVersion" : 4.0

Garbage Collection of Expired Snapshots¶

Ops Manager manages expired snapshots using groom jobs. These groom jobs act differently depending upon which snapshot store contains the snapshots:

Snapshot Store

How Groom Jobs Work

MongoDB Blockstore

Uses additional disk space up to the amount of living blocks for each job.

Filesystem Snapshot stores

Deletes expired snapshots

S3 snapshot stores

For FCV 4.0 or earlier, may use additional disk space if Ops Manager creates a snapshot while the groom job is running.

For FCV 4.2 or later, Ops Manager can’t create snapshots while a groom job is running.

Namespaces Filter¶

The namespaces filter lets you specify which databases and collections to back up. You create either a Blacklist of those to exclude or a Whitelist of those to include. You make your selections when starting a backup and can later edit them as needed. If you change the filter in a way that adds data to your backup, a resync is required.

Use the blacklist to prevent backup of collections that contain logging data, caches, or other ephemeral data. Excluding these kinds of databases and collections will allow you to reduce backup time and costs. Using a blacklist is often preferable to using a whitelist as a whitelist requires you to intentionally opt in to every namespace you want backed up.

Note

MongoDB deployments with "featureCompatibilityVersion" : 4.2 do not support namespaces filters.

Storage Engine¶

To backup MongoDB clusters, use the WiredTiger storage engine storage engine.

If your current backing databases use MMAPv1, upgrade to WiredTiger:

With WiredTiger, Ops Manager limits backups to deployments with fewer than 100,000 files. Files includes collections and indexes.

MongoDB 4.2 removes MMAPv1 storage. To learn more about storage engines, see Storage in the MongoDB manual.

Resyncing Production Deployments¶

For production deployments, it is recommended that as a best practice you periodically (annually) resync all backed-up replica sets. When you resync, data is read from a secondary in each replica set. During resync, no new snapshots are generated.

You may also want to resync your backup after:

A reduction in data size, such that the size on disk of Ops Manager’s copy of the data is also reduced. This scenario also includes if you:
- Have a TTL index in place, which periodically deletes documents.
- Drop a collection (MMAPv1 only).
- Run a sharded cluster, and there have been a lot of chunks moved off a particular shard.
A switch in storage engines, if you want Ops Manager to provide snapshots in the new storage engine format.
A manual build of an index on a replica set in a rolling fashion (as per Build Indexes on Replica Sets in the MongoDB manual).

Checkpoints¶

Important

You may use checkpoints for clusters that run MongoDB with Feature Compatibility Version of 4.0 or earlier. Checkpoints were removed from MongoDB instances with FCV of 4.2 or later.

For sharded clusters, checkpoints provide additional restore points between snapshots. With checkpoints enabled, Ops Manager creates restoration points at configurable intervals of every 15, 30 or 60 minutes between snapshots. To enable checkpoints, see enable checkpoints.

To create a checkpoint, Ops Manager stops the balancer and inserts a token into the oplog of each shard and config server in the cluster. These checkpoint tokens are lightweight and do not have a consequential impact on performance or disk use.

Backup does not require checkpoints, and they are disabled by default.

Restoring from a checkpoint requires Ops Manager to apply the oplog of each shard and config server to the last snapshot captured before the checkpoint. Restoration from a checkpoint takes longer than restoration from a snapshot.

Snapshots when Agent Cannot Stop Balancer¶

For sharded clusters, Ops Manager disables the balancer before taking a cluster snapshot. In certain situations, such as a long migration or no running mongos, Ops Manager tries to disable the balancer but cannot. In such cases, Ops Manager will continue to take cluster snapshots but will flag the snapshots with a warning that data may be incomplete and/or inconsistent. Cluster snapshots taken during an active balancing operation run the risk of data loss or orphaned data.

Snapshots when Agent Cannot Contact a `mongod`¶

For sharded clusters, if the Backup cannot reach a mongod process, whether a shard or config server, then the agent cannot insert a synchronization oplog token. If this happens, Ops Manager does not create the snapshot and displays a warning message.

← Backup Process Back up a Deployment →

Backup Preparations¶

Backup Configuration Options¶

Backup Method Features¶

Backup Sizing Recommendation¶

Snapshot Frequency and Retention Policy¶

Limits¶

Encryption¶

Backup Considerations¶

Databases Running FCV 4.2¶

Backup Features Supported at Present¶

Requirements and Limitations¶

Databases not Running FCV 4.2¶

Garbage Collection of Expired Snapshots¶

Namespaces Filter¶

Storage Engine¶

Resyncing Production Deployments¶

Checkpoints¶

Snapshots when Agent Cannot Stop Balancer¶

Snapshots when Agent Cannot Contact a mongod¶

Snapshots when Agent Cannot Contact a `mongod`¶