Over the next few days, AWS will be issuing an emergency patch, and as part of this process they’ll rebooting a substantial number of EC2 instances. The maintenance starts tomorrow and will continue until the end of the month. To see if you’ll be impacted, AWS recommends you go to the “Events“ page on the EC2 console, which will list any pending instance reboots for your AWS account.
For those of you who run MongoDB on EC2, you can easily distribute your replica set nodes across multiple availability zones to help ensure that your deployment withstands outages like these without suffering any application downtime. If you have a node in an AZ that gets rebooted, your replica set will automatically fail over to a node in a different AZ. No big deal.
Here’s a (non-exhaustive) AWS-reboot-preparedness checklist:
What to Expect. When AWS reboots the instances, you should expect to see failovers occur in your replica sets. A failover typically lasts no more than a few seconds, but while it’s in progress, writes will fail and reads on the primary will fail. Once the failover process has completed, normal operation should be restored.
- Backup. We always recommend you take regular backups. This weekend especially -- make sure you have a current backup of your data before the EC2 instances get rebooted.
- Availability Zones. As a general practice, we recommend that you deploy replica sets across multiple availability zones. In this case, you may want to proactively change your replica sets' primaries to nodes that will not be impacted by the reboot. And if your nodes aren’t spread across availability zones, we’d suggest that you make this change now so that you have a valid voting config when the instances get rebooted.
- Replica Set Review. Do a once over of your replica sets to ensure that, if any given availability zone is rebooted, you have enough voting members to continue normal database operations. Some common strategies include:
If for some reason normal operation is not restored, Production Support and MongoDB Enterprise customers should reach out to our support team; others should take advantage of our incredibly active community on Google Groups. Our support organization is on call to assist proactively in advance of the maintenance, or to respond in case of any incidents related to the reboot. And we’ve provisioned some extra folks this weekend just to be sure you have the help you need.