Docs Menu

Docs HomeLaunch & Manage MongoDBMongoDB Atlas

Simulate Regional Outage

On this page

  • Required Access
  • Simulate Regional Outage Process
  • Simulate Regional Outage Using the Atlas UI
  • Simulate Regional Outage Using the API
  • Verify the Outage
  • Troubleshoot Outage

Note

  • This feature is not available for M0 free clusters, M2, and M5 clusters. To learn more, see Atlas M0 (Free Cluster), M2, and M5 Limits.

  • This feature is not supported on Serverless instances at this time. To learn more, see Serverless Instance Limitations.

You can use the Atlas UI and API to simulate an outage on your Atlas multi-region cluster and observe how your application handles an outage in one or more regions.

To start an outage simulation, you must have Organization Owner or Project Owner access to the project.

When you submit a request to test an outage using the Atlas UI or API, Atlas simulates an outage event. During a simulated outage, Atlas:

If your application takes more than 15 minutes to notice connection loss to some nodes, we recommend that you reduce your TCP retransmission timeout values. To learn more, see modify tcp_retries2 value.

To simulate a Regional Outage in the Atlas UI:

  1. Log in to the Atlas UI.

  2. Click Database.

  3. For the cluster you wish to perform outage testing, click the ... button.

  4. Click Test Resilience.

  5. Select Regional Outage. Atlas displays a Test Resilience modal with the steps Atlas takes to simulate an outage event. To learn more, see Simulate Regional Outage Process.

  6. Click Select Regions.

  7. Select the tab corresponding to the type of outage you want to simulate:

  8. Select Simulate Regional Outage to begin the test. Atlas notifies you when the outage occurs.

  9. Select a tab corresponding to the type of outage you are performing:

You can use the Test Outage API endpoint to simulate an outage event. To learn more about the outage process, see Simulate Regional Outage Process.

To verify that the outage is successful, monitor your application and ensure your read and write operations are working as expected.

A regional outage or regional outage simulation that affects the highest priority regions in a sharded cluster could cause the cluster to become inoperable for read operations. To restore the config servers, do the following:

  • Configure a read preference that is suitable for querying secondary nodes for reads.

  • Reconfigure the cluster for regaining electable nodes.

←  Test Primary FailoverManage Connections with AWS Lambda →