Atlas 备份指导
MongoDB Atlas 提供完全托管和可定制的备份,以确保数据保留和恢复:
云备份:使用云提供商的原生快照功能拍摄,支持全副本快照和本地化快照存储。这些快照始终具有增量性质,并利用云提供商的底层备份快照机制,实现低费用和快速恢复。您选择的备份策略指定了一定数量的每日、每周和每月的快照。
持续云备份:这是云备份的一项附加功能,提供给定时间点恢复。该功能允许您通过备份 oplog 并捕获快照之间的数据变化,在恢复过程中恢复到特定的分钟。该功能允许您将数据恢复到故障或事件发生前的确切时刻(给定时间点),以满足最短 1 分钟的恢复点目标 (RPO)。
我们不建议为开发和测试环境启用备份。对于预发布和生产环境,建议您开发包含本页所述建议的自动化部署模板。
Atlas 备份功能
Atlas提供完全托管的数据备份,包括时间点数据恢复以及所有集群(包括分片的集群)的一致的全集群快照。在Atlas中,您可以选择四种快照频率:每小时、每天、每周和每月,每种频率都有自己的保留期。
云备份 | 该功能使用集群云服务提供商的原生快照功能提供本地化备份存储。优势包括 12 个月的强大的默认备份保留安排,具备完全灵活的自定义快照及保留方案,并且能够设置不同的快照频率(例如,每小时快照用于快速恢复,每周或每月快照用于长期保留),以满足行业法规要求。您可以即时访问备份数据,这对于 Atlas 审核、合规或数据恢复非常有用,您也可以直接对备份数据运行查询,从而节省时间和资源。 |
持续的云备份 | 此功能提供给定时间点 (PIT) 恢复,使您可以恢复到任何时间戳。这使您可以将数据恢复到故障或事件(例如网络攻击)发生之前的精确点(给定时间点)。您还可以设置一个自定义的恢复窗口,指定您希望能够恢复到给定时间点的天数。 |
多区域快照分发 | 该功能可通过跨地理区域自动分发备份快照和 oplog,而不是仅仅将其存储在主区域,从而提高韧性。您可以满足将备份存储在不同且物理隔离的地理位置的合规要求,以确保在发生区域性中断时能够实现灾难恢复。 要了解更多信息,请参阅 快照分发。 |
备份合规策略 | 此功能使您能够通过防止存储在 Atlas 中的所有快照和 oplog 在您指定的预定义保留期内被修改或删除,进一步保护关键业务数据,确保备份完全符合 WORM(一次写入多次读取)标准。只有指定的授权用户在完成与 MongoDB 支持的验证过程后,才能关闭此保护功能。此功能增加了强制性的手动延迟和冷却期,这样攻击者就无法更改备份策略和导出数据。要了解更多信息,请参阅配置备份合规策略。 |
Atlas 备份的建议
备份策略的建议
您必须将备份策略与特定的恢复点目标 (RPO) 和恢复时间目标 (RTO) 保持一致,以满足业务连续性要求,尤其是对于关键应用程序,其中近乎即时的恢复时间目标和快速恢复时间至关重要。RPO 定义了事件期间可接受的最大数据丢失量,而 RTO 定义了应用程序恢复的速度。由于数据的重要性不同,您必须为每个应用程序单独评估 RPO 和 RTO。例如,关键任务数据可能会有与点击流分析不同的要求。您对 RTO、RPO 和备份保留期的要求将影响维护备份的成本和性能考量。在开发和测试环境中,我们建议您禁用备份以节省成本。在预发布和生产环境中,请确保在部署模板中启用备份,并且您已成功测试备份和恢复的程序和流程。
从备份中恢复大型副本集(和分片)需要更长的时间。在暂存和生产环境中,我们建议您通过测试技术确定副本集大小或分片大小限制,以确保您的大小符合 RTO 要求。确保快照计划和保留策略满足任何 RPO 要求。
在生产环境中,除了 Atlas 云备份之外,我们建议您默认启用持续云备份,恢复窗口为七天。根据工作负载的重要性,将此时间范围调整为更长的设置。这使您可以重播 oplog,以从特定时间点恢复集群并满足您的 RTO。
备份策略的建议
Atlas 提供预定义的备份快照计划,包括快照的频率和保留期。长时间保留备份快照可能会导致高昂的成本。我们建议您根据数据和环境的规模和重要性(开发、测试、预发布、生产)来构建符合您需求的自动化部署模板。对于快照的频率和保留,我们建议如下:
层级 | RTO | RPO | 推荐的频率和保留时间 | Atlas 备份快照总数 |
---|---|---|---|---|
1 层级 | 30 分钟 | 接近零(7 天内) | Hourly: Every 12 hours, retain for 7 days = 14 snapshots Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 6 snapshots | 31 |
2 层级 | 12 小时 | 接近零(7 天内) | Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 3 snapshots | 14 |
3 层级 | 3 天 | 接近零(2 天内) | Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 3 snapshots | 14 |
备份分发建议
Atlas 提供备份位置的选项。为了进一步增强韧性,我们建议将备份分发到本地区域和外部灾难恢复区域,确保即使在区域服务中断期间也能恢复数据。对于位于三个区域的 Atlas 集群,多区域快照分发将备份复制到两个从节点区域,从而可以使用备份副本进行恢复。您还可以将关键备份及其给定时间点数据复制到云提供商在 Atlas 中提供的任何从节点区域。
当您配置快照频率、保留和分发时,我们建议在可用性和成本之间取得平衡。然而,您的关键工作负载可能需要在不同位置拥有多个快照副本。
备份合规策略的建议
我们建议实施 Atlas 的备份合规策略,以防止未经授权的备份修改或删除,从而维护数据完整性并支持强大的灾难恢复。
PIT 恢复的建议
持续云备份能够实现精确的给定时间点 (PIT) 恢复,从而最大限度地减少故障期间的数据丢失。Atlas 可以快速恢复到故障事件发生前的准确时间戳,即使在主区域服务中断的情况下,利用优化的恢复功能,也能为您提供至少 1 分钟的 RPO 和少于 15 分钟的 RTO 。这是因为 Atlas 会恢复所需给定时间点之前的最新快照,然后重放 oplog 更改以恢复到该特定点。恢复时间可能会因云提供商磁盘预热以及恢复过程中必须重放 oplog 的数量而有所不同。在云提供商的磁盘预热完成之前,恢复后的集群性能可能会较慢。如果您能够灵活地满足恢复需求,我们建议设计模板,在合理的恢复选项和成本之间找到最佳折中方案。
备份成本的建议
要优化Atlas备份成本,您必须调整备份频率和保留策略以与数据关键程度保持一致,从而减少不必要的存储费用。示例,您应在较低环境中禁用备份,并确保在具有高可用性要求的较高环境中将备份分发到部署Atlas集群的每个地区。您还可以通过仅捕获增量更改的快照和内置压缩来使用增量备份,以最大限度地减少存储的数据量。通过战略性地选择备份区域,可以避免跨区域数据传输费用,并根据工作负载选择合适的集群磁盘大小以防止超支。通过实施这些策略,您可以有效管理成本,同时保持安全可靠的备份。
自动化示例:Atlas 备份
请参阅 Terraform 示例,在 Github 的一个位置跨所有支柱实施 Staging/Prod 建议,涵盖所有支柱。
以下示例使用 Atlas 工具启用备份和恢复操作的自动化。
这些示例仅适用于启用了备份的集群的预发布环境和生产环境。
运行以下命令,为名为 myDemo 的集群拍摄备份快照,并将该快照保留 7 天:
atlas backups snapshots create myDemo --desc "my backup snapshot" --retention 7
为您的项目启用备份合规性策略,指定的授权用户 (governance@example.org
) 在完成与 MongoDB 支持的验证过程后,才可以关闭此保护。
atlas backups compliancePolicy enable \ --projectId 67212db237c5766221eb6ad9 \ --authorizedEmail governance@example.org \ --authorizedUserFirstName john \ --authorizedUserLastName doe
运行以下命令,为计划的备份快照创建合规策略,该策略强制执行必须拍摄快照的次数(设立为每 6
小时)和保留快照的持续时间(设立为 1
个月) 。
atlas backups compliancePolicy policies scheduled create \ --projectId 67212db237c5766221eb6ad9 \ --frequencyInterval 6 \ --frequencyType hourly \ --retentionValue 1 \ --retentionUnit months
以下示例演示如何在部署过程中配置备份。在使用 Terraform 创建资源之前,您必须:
创建您的付款组织并为该付款组织创建一个 API 密钥。请在终端中运行以下命令,将您的 API 密钥存储为环境变量:
export MONGODB_ATLAS_PUBLIC_KEY="<insert your public key here>" export MONGODB_ATLAS_PRIVATE_KEY="<insert your private key here>"
常见文件
您必须为每个示例创建以下文件。将每个示例的文件放在各自的目录中。更改 ID 和名称以使用您的值。然后运行命令以初始化 Terraform、查看 Terraform 计划并应用更改。
variables.tf
variable "org_id" { description = "Atlas organization ID" type = string } variable "project_name" { description = "Atlas project name" type = string } variable "cluster_name" { description = "Atlas Cluster Name" type = string } variable "point_in_time_utc_seconds" { description = "PIT in UTC" default = 0 type = number }
配置集群的备份计划
使用以下内容为集群配置层级1备份安排。
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_hourly { frequency_interval = 12 # backup every 12 hours, accepted values = 1, 2, 4, 6, 8, 12 -> every n hours retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
使用以下内容为集群配置层级2备份安排。
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month # accepted values = 40 -> every last day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
使用以下内容为集群配置层级3备份安排。
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month # accepted values = 40 -> every last day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
为集群配置备份和 PIT 恢复
使用以下内容配置云备份快照和 PIT 恢复作业。
main.tf
Create a project resource "mongodbatlas_project" "project_test" { name = var.project_name org_id = var.org_id } Create a cluster with 3 nodes resource "mongodbatlas_advanced_cluster" "cluster_test" { project_id = mongodbatlas_project.project_test.id name = var.cluster_name cluster_type = "REPLICASET" backup_enabled = true # enable cloud provider snapshots pit_enabled = true retain_backups_enabled = true # keep the backup snapshopts once the cluster is deleted replication_specs { region_configs { priority = 7 provider_name = "AWS" region_name = "US_EAST_1" electable_specs { instance_size = "M10" node_count = 3 } } } } Specify number of days to retain backup snapshots resource "mongodbatlas_cloud_backup_snapshot" "test" { project_id = mongodbatlas_advanced_cluster.cluster_test.project_id cluster_name = mongodbatlas_advanced_cluster.cluster_test.name description = "My description" retention_in_days = "1" } Specify the snapshot ID to use to restore resource "mongodbatlas_cloud_backup_snapshot_restore_job" "test" { count = (var.point_in_time_utc_seconds == 0 ? 0 : 1) project_id = mongodbatlas_cloud_backup_snapshot.test.project_id cluster_name = mongodbatlas_cloud_backup_snapshot.test.cluster_name snapshot_id = mongodbatlas_cloud_backup_snapshot.test.id delivery_type_config { point_in_time = true target_cluster_name = mongodbatlas_advanced_cluster.cluster_test.name target_project_id = mongodbatlas_advanced_cluster.cluster_test.project_id point_in_time_utc_seconds = var.point_in_time_utc_seconds } }