Data is the core currency in today’s digital economy. In the same way that central banks take extensive measures to safeguard this vital asset, organizations need to take every practical measure to protect their data. Yet, according to industry research, 43% of companies that experience major data loss incidents are unable to reopen business operations.1
Data loss incidents can take a variety of forms, but when it comes to database technology, they generally fall into two categories. The first category is catastrophic failure and the second is human error.
Catastrophic failure includes natural disasters and any other scenario that permanently destroy all the nodes in your production system. If you keep all your servers in the same data center, a fire that destroys them would qualify. This is typically what we imagine when we implement a backup strategy.
While they may be less newsworthy than catastrophic failure, the reality is that human error or flawed process account for the vast majority of IT outages.2 Humans introduce application bugs, deliberately hack into systems or accidentally delete data. A bad code release that corrupts some or all of the production data is an unfortunate but common example. In the case of human error, the errors introduced will propagate automatically to the replicas, often within seconds.
A backup and recovery strategy is necessary to protect your mission critical data against these types of risks. With such a strategy in place, you’ll gain peace of mind knowing that in the event of a failure, you’re ready to restore business operations without data loss. You’ll also satisfy important regulatory and compliance requirements by demonstrating that you’ve taken a proactive approach towards data safety.
Taking regular backups offers other advantages as well. The backups can be used to create new environments for development, staging or QA without impacting production. This practice enables development teams to quickly and easily test new features, accelerating development of new features and ensuring smooth product launches.
Database systems are the most critical components to safeguard, yet often the most complex to back up and restore properly.3 A robust database backup strategy includes a mix of technology and process to prepare for failures that might result in the complete disruption or loss of the business.
Below we’ll outline considerations when preparing a backup strategy, and then specific approaches for backing up MongoDB.
All backup systems capture a snapshot of your system from a past moment in time. A critical feature of the backup is that it is frozen forever. Restoring the backup snapshot rolls back the clock on the unexpected event that caused the loss of data across your system.
In preparing a backup strategy, organizations typically start by evaluating their recovery point objective (RPO) and recovery time objective (RTO). The RPO indicates how much data the business is willing to lose in the event of an incident, while the RTO indicates how quickly it will take to recover. As not all data is created equal, this RPO and RTO should be evaluated on an application-by-application basis. You’ll likely have different requirements for your mission critical customer data than you will for your clickstream analytics.
Your requirements for RTO and RPO will drive the economic and performance cost of maintaining backups. For example, if you are satisfied to lose up to a year’s data in the event of a fire, you could take a snapshot once a year and store the backups on physical, disconnected tapes in a remote location. This would be a relatively inexpensive solution. However, when disaster strikes, you will need to physically mail the tapes to your datacenter. The backup may be a year old, could take days to arrive and even longer to restore to production. On the other hand, if you never want to lose more than a minute or two of data or incur more than a few minutes of downtime, you will need a continuous backup solution with point-in-time recovery.
There is a tradeoff between achieving a better RTO and reducing the probability that a disaster also destroys your backups. Keeping your backup snapshots far from the primary database servers, both logically and physically, lowers the likelihood that they will be destroyed at the same time as your database, but increases the recovery time.
You’ll also want to consider the ongoing maintenance, cost and performance impact of the backup system. While backups are crucial for the safety of your organization’s mission critical data, they need to be evaluated in the context of the overall resource utilization across of your production system.
The table below outlines the key considerations when evaluating backup strategies.
|Recovery Point Objective||The amount data your business is willing to lose in the event of a disaster.|
|Recovery Time Objective||The amount of time it takes to recover. This includes both the time it takes to retrieve the backup and put it into production. In scenarios where restoring from a backup is necessary, it is likely that you will incur some downtime.|
|Isolation||Backups should be separate from the production system to ensure that any disruptions to production do not correlate with a disruption to the backup.|
|Performance Impact||Backup techniques have varying impact on the performance of the running database. Some backup solutions degrade database performance enough that you may need to schedule backups to avoid peak usage or maintenance windows. You may decide to deploy new secondary servers just to support backups.|
|Restore Process||A backup system is only a good as your ability to restore from it. One of the critical components of any backup strategy is not only having the data but having ability to restore it and doing practice runs of your restores to ensure that they work in the event of a “data emergency.”|
|Sharding||Backing up a distributed database system, such as a MongoDB sharded cluster, presents an additional layer of complexity. To achieve a true consistent backup, all write activity must be paused across the system.|
|Deployment Complexity||While backup is critical in the case of a disaster scenario, it’s not your core business. Ideally you want a backup strategy that is easy to set up and maintain over time so that you can focus on your business.|
|Flexibility||Partial backup strategies will provide you with the flexibility to filter out data so that you aren’t utilizing resources backing up non-mission critical components of your system. Similarly, during the restore process, you may want the flexibility to recover only certain components of the data. Finally, an incremental backup strategy will only back up the parts of the data that have changed since the last snapshot, making it a more efficient and flexible strategy than taking complete backups at every snapshot.|
In the context of MongoDB, there three main strategies for backing up MongoDB:
Below we’ll outline these different approaches and the benefits and drawbacks of each.
mongodump is a tool bundled with MongoDB that performs a backup of the data in MongoDB. mongodump may be used to dump an entire database, collection, or result of a query. mongodump can produce a consistent snapshot of the data by dumping the oplog. You can then use the mongorestore utility to restore the data to a new or existing database. mongorestore will import content from the BSON database dumps produced by mongodump and replay the oplog.
mongodump is a straightforward approach and has the benefit of producing backups that can be filtered based on your specific needs. While mongodump is sufficient for small deployments, it is not appropriate for larger systems. mongodump exerts too much load to be a truly scalable solution. It is not an incremental approach, so it requires a complete dump at each snapshot point, which is resource-intensive. As your system grows, you should evaluate lower impact solutions such as filesystem snapshots or MMS.
In addition, while the complexity of deploying mongodump for small configurations is fairly low, the complexity of deploying mongodump in large sharded systems can be significant.
You can back up MongoDB by copying the underlying files that the database processes uses to store data. To obtain a consistent snapshot of the database, you must either stop all writes to the database and use standard file system copy tools, or create a snapshot of the entire file system, if your volume manager supports it.
For example, Linux LVM quickly and efficiently creates a consistent snapshot of the file system that can be copied for backup and restore purposes. To ensure that the snapshot is logically consistent, you must have journaling enabled within MongoDB.
Because backups are taken at the storage level, filesystem snapshots can be a more efficient approach than mongodump for taking full backups and restoring them. However, unlike mongodump, it is a more coarse approach in that you don’t have the flexibility to target specific databases or collections in your backup. This may result in large backup files, which in turn may result in long-running backup operations.
To implement filesystem snapshots requires ongoing maintenance as your system evolves and becomes more complex. To coordinate backups across multiple replica sets, particularly in a sharded system, requires devops expertise to ensure consistency across the various components.
MongoDB Management Service provides continuous, online backup for MongoDB as a fully managed service. You install the Backup Agent in your environment, which conducts an initial sync to MongoDB’s secure and redundant datacenters. After the initial sync, MMS streams encrypted and compressed MongoDB oplog data to MMS so that you have a continuous backup.
By default, MMS takes snapshots every 6 hours and oplog data is retained for 24 hours. The snapshot schedule and retention policy can be configured to meet your requirements. You also have the flexibility to exclude non-mission critical databases and collections.
For replica sets, a custom, point-in-time snapshot can be restored for any moment in the last 24 hours. For sharded system, MMS produces a consistent snapshot of the cluster every 6 hours.
Because MMS reads only the oplog, the ongoing performance impact is minimal, similar to that of adding an additional node to the replica set.
In addition to the cloud-based service, MMS is available as on-prem software as part of a MongoDB Standard or Enterprise Subscription.
The table below outlines the key backup considerations for the three MongoDB backup strategies.
|Recovery Point Objective||Limited to snapshot moments.||Limited to snapshot moments, but snapshots are lower overhead and hence often more frequent.||Point-in-time for replica sets. Limited to snapshot moments for entire sharded cluster.|
|Recovery Time Objective||Requires running mongorestore. Latency of mongorestore depends on location of the dumps and granularity of what is being recovered.||Depends on latency of bringing the snapshot closer to the production servers and expanding it into a running filesystem.||Depends on how long it takes to transfer the backup snapshot over the network. Restore time will increase if constructing and restoring to a custom, point-in-time snapshot vs. a stored snapshot.|
|Isolation||Depends on how far the backup snapshots are kept from production.||Depends on how far the backup snapshots are kept from production.||Backups stored in MongoDB’s redundant and secure datacenters, outside of AWS.|
|Performance Impact||Significant if run in online mode.||Fairly low, depending on implementation.||May add significant load on initial sync. After that, impact is similar to that of a secondary node, typically very low.|
|Restore Process||Use mongorestore to unpack the BSON files.||Typically expand the snapshot onto a new filesystem.||Transfer snapshot from MMS then expand. MMS restores actual database files, not BSON dumps.|
|Sharding||Requires scripting across the entire cluster and synchronizing snapshots.||Requires scripting across the entire cluster and synchronizing snapshots.||Support for consistent snapshots of sharded clusters.|
|Deployment Complexity||Low for small systems, high for large clusters - requires scripting, storage management, and monitoring.||Low for small systems, medium for large clusters - requires scripting, storage management, and monitoring.||Low|
|Flexibility||Requires custom scripting.||Coarse approach - cannot target specific databases.||Flexibility to exclude non-critical collections, set custom snapshot schedule and retention policy.|
MongoDB Management Service (MMS) is the application for managing MongoDB, created by the engineers who develop MongoDB. Using a simple yet sophisticated user interface, MMS makes it easy and reliable to run MongoDB at scale, providing the key capabilities you need to ensure a great experience for your customers, including continuous, incremental backup with point-in-time recovery.