Docs Menu

Docs HomeLaunch & Manage MongoDBMongoDB Atlas

Alert Basics

On this page

  • Useful Metrics and Alert Conditions
  • Configure Alerts
  • Resolve Alerts
  • Alerts Workflow

Atlas provides built-in tools, alerts, charts, integrations, and logs to help you monitor your database deployments. Atlas provides alerts to help you monitor your database deployments and improve performance in the following ways:

  1. A variety of conditions can trigger an alert.

  2. You can configure alerts settings based on specific conditions for your databases, users, accounts, and more.

  3. When you resolve alerts, you can fix the immediate problem, implement a long-term solution, and monitor your progress.

Atlas issues alerts for the database and server conditions configured in your alert settings. When a condition triggers an alert, Atlas displays a warning symbol on the cluster and sends alert notifications. Your alert settings determine the notification methods. Atlas continues sending notifications at regular intervals until the condition resolves or you delete or disable the alert.

When you configure alerts, you specify alert conditions and thresholds. Review the possible alert conditions for which you can trigger alerts related to your database deployments.

Note

M0 free clusters and M2/M5 shared clusters only trigger alerts related to the metrics supported by those clusters. See Atlas M0 (Free Cluster), M2, and M5 Limits for complete documentation on M0/M2/M5 alert and metric limitations.

Consistently monitor metrics to help ensure efficient database deployments.

These alert conditions help you monitor the number of concurrent read or write operations that can occur. When all tickets are claimed, operations must wait and enter the queue.

You can view these metrics on the Tickets Available chart, accessed through cluster monitoring.

To learn more, see the Tickets Available alert conditions.

These alert conditions measure operations waiting on locks.

You can view these metrics on the Queues chart, accessed through cluster monitoring.

To learn more, see the Queues alert conditions.

AWS EC2 clusters that support Burstable Performance might experience CPU steal when using shared CPU cores. This alert condition measures the percentage by which the CPU usage exceeds the guaranteed baseline CPU credit accumulation rate.

CPU credits are units of CPU utilization that you accumulate. The credits accumulate at a constant rate to provide a guaranteed level of performance. These credits can be used for additional CPU performance. When the credit balance is exhausted, only the guaranteed baseline of CPU performance is provided, and the amount of excess is shown as steal percent.

You can view CPU usage on the Normalized System CPU chart, accessed through cluster monitoring.

To learn more, see the System: CPU (Steal) % is alert condition.

Properly configured indexes can significantly improve query performance. These alert conditions help identify inefficient queries. Too many indexes can impact write performance.

You can view these metrics on the Query Targeting chart, accessed through cluster monitoring.

To learn more, see the Query Targeting alert conditions.

Each Atlas instance has a connection limit. These alert conditions help you proactively address scaling needs or potential issues related to connection availability.

You can view these metrics on the Connections chart, accessed through cluster monitoring.

To learn more, see the Connection alert conditions.

To set which conditions trigger alerts and how users are notified, Configure Alert Settings. You can configure alerts at the organization or project level. Atlas provides default alerts at the project level. You can clone existing alerts and configure maintenance window alerts.

Experiment with alert condition values based on your specific requirements. Periodically reassess these values for optimal performance.

Configure the alert settings to send an alert if these metrics drop below 30 for at least a few minutes. You want to avoid false positives triggered by relatively harmless short-term drops, but catch issues when these metrics stay low for a while.

To configure these alert conditions, see Configure Alert Settings.

Configure the alert settings to send an alert if these metrics rise above 100 for a minute. You want to avoid false positives triggered by relatively harmless short-term spikes, but catch issues when these metrics stay elevated for a while.

To configure these alert conditions, see Configure Alert Settings.

Configure the alert settings to send an alert if this metric rises above 10%.

To configure this alert condition, see Configure Alert Settings.

Configure the alert settings to send an alert if this metric rises above 50 or 100.

To configure these alert conditions, see Configure Alert Settings.

Configure the alert settings to send an alert if the Connection % of the configured limit rises above 80% or 90%.

To configure these alert conditions, see Configure Alert Settings.

When a condition triggers an alert, Atlas displays a warning symbol on the cluster and sends alert notifications. Resolve these alerts and work to prevent alert conditions from occurring in the future. To learn how to fix the immediate problem, implement a long-term solution, and monitor your progress, see Resolve Alerts.

Tickets Available alerts can help you detect queries that took a little longer than expected due to load.

Increasing your instance size, or sometimes disk speed, can help these metrics.

Queues alerts can help you detect queries that took a little longer than expected due to load.

Increasing your instance size, or sometimes disk speed, can help these metrics.

The System: CPU (Steal) % is alert occurs when the CPU usage exceeds the guaranteed baseline CPU credit accumulation rate by the specified threshold.

To learn more, see Fix CPU Usage Issues.

Query Targeting alerts often indicate inefficient queries.

To learn more, see Fix Query Issues.

Connection alerts typically occur when the maximum number of allowable connections to a MongoDB process has been exceeded. Once the limit is exceeded, no new connections can be opened until the number of open connections drops down below the limit.

To learn more, see Fix Connection Issues.

When an alert condition is met, the alert lifecycle begins.

To learn more, see the Alerts Workflow.

← Configure and Resolve Alerts