/ /

Configure and Resolve Alerts

Docs Home

Management

Monitor Clusters

Configure and Resolve Alerts

Docs Home

Management

Monitor Clusters

Configure and Resolve Alerts

Review Alert Conditions

This page describes the conditions for which you can trigger alerts. You specify conditions and thresholds when configuring alerts. To learn more, see Alerts Workflow.

Note

Free clusters and Flex clusters only trigger alerts related to the metrics supported by those clusters. For complete documentation on Free clusters and Flex clusters alert and metric limitations, see Atlas Free Cluster Limits, and Atlas Flex Limitations.

Host Alerts

The conditions in this section apply if you select Host as the alert target when configuring the alert. You can apply the condition to all hosts or to specific type of host, such as primaries or config servers.

Atlas triggers certain host alerts based on cluster monitoring, and are thus subject to variations in granularity. To learn more, see Monitoring Data Storage Granularity.

Advisor

Host has index suggestions

Raised if Performance Advisor has index suggestions for the host.

If the query targeting ratio for a host is greater than 8000 and if Performance Advisor determines that the host benefits from one or more indexes to improve performance of inefficient queries, this alert triggers and directs you to create the suggested indexes.

This alert is only available for M10+ clusters, and is enabled by default for M10+ clusters that have Performance Advisor enabled. This alert does not trigger for clusters where Performance Advisor is disabled.

Asserts

The following alert conditions measure the rate of asserts for a MongoDB process, as collected from the MongoDB serverStatus command's asserts document. You can view asserts through cluster monitoring.

Asserts: Msg is: Raised if the rate of message asserts meets the specified threshold. Message asserts are internal server errors. Stack traces are logged for these.

Asserts: Regular is: Raised if the rate of regular asserts meets the specified threshold.

Asserts: User is: Raised if the rate of errors generated by users meets the specified threshold.

Asserts: Warning is: Raised if the rate of warnings meets the specified threshold.

Auto-Scaling

You can configure alerts for the following cluster events. View the activity feed to review all auto-scaling events that occurred.

For each event in this section, to receive alerts, you must first configure an alert to notify you or members of your organization of this type of an auto-scaling event.

To learn how Atlas scales a cluster up or down, see Configure Auto-Scaling.

Auto-scaling: Compute auto-scaling initiated for base tier: Raised if Atlas starts compute auto-scaling for any of the operational nodes in your dedicated cluster. Atlas can scale disk capacity as part of this event.

Auto-scaling: Compute auto-scaling initiated for analytics tier: Raised if Atlas starts compute auto-scaling for any of the analytics nodes in your dedicated cluster. Atlas can scale disk capacity as part of this event.

Auto-scaling: Compute auto-scaling down didn't initiate for base tier due to storage requirements: Raised if Atlas couldn't start compute auto-scaling for any of the operational nodes in your dedicated cluster as the configured storage size isn't supported by the target cluster tier.

Auto-scaling: Compute auto-scaling down didn't initiate for analytics tier due to storage requirements: Raised if Atlas couldn't start compute auto-scaling for any of the analytics nodes in your dedicated cluster as the configured storage size isn't supported by the target cluster tier.

Auto-scaling: Compute auto-scaling didn't initiate for base tier due to maximum configured cluster tier: Raised if Atlas couldn't scale up an operational node because your cluster reached a maximum cluster tier configured for auto-scaling.

Auto-scaling: Compute auto-scaling didn't initiate for analytics tier due to maximum configured cluster tier: Raised if Atlas couldn't scale up an analytics node because your cluster reached a maximum cluster tier configured for auto-scaling.

Auto-scaling: Compute auto-scaling didn't initiate for base tier due to insufficient oplog size: Raised if Atlas couldn't scale up an operational node due to insufficient oplog size. To learn more, see Set Minimum Oplog Window.

Auto-scaling: Compute auto-scaling didn't initiate for analytics tier due to insufficient oplog size: Raised if Atlas couldn't scale up an analytics node due to insufficient oplog size. To learn more, see Set Minimum Oplog Window.

Auto-scaling: Predictive compute auto-scaling initiated for base tier: Raised if Atlas starts predictive compute auto-scaling for any of the operational nodes in your dedicated cluster.

Auto-scaling: Predictive compute auto-scaling did not initiate for base tier due to maximum configured cluster tier: Raised if Atlas couldn't predictively scale up an operational node because your cluster reached a maximum cluster tier configured for auto-scaling.

Auto-scaling: Predictive auto-scaling did not initiate for base tier due to insufficient oplog size: Raised if Atlas couldn't predictively scale up an operational node due to insufficient oplog size. To learn more, see Set Minimum Oplog Window.

Auto-scaling: Disk auto-scaling initiated: Raised if Atlas starts auto-scaling disk capacity.

Auto-scaling: Disk auto-scaling didn't initiate due to the cluster reaching maximum available disk size: Raised if Atlas couldn't scale up the disk size because the cluster has reached maximum available disk size.

Auto-scaling: Disk auto-scaling didn't initiate due to insufficient oplog size: Raised if Atlas couldn't scale up the disk size because the cluster's oplog size isn't sufficient.

Write-Blocking

The following alert conditions apply to write-blocking behavior in Atlas.

Writes have been blocked on your cluster due to critically low disk space

Raised when the percentage of used disk on the primary node has exceeded write-blocking policy thresholds. Atlas blocks writes to the cluster node to maintain read availability.

To resolve this alert, increase your cluster's storage capacity manually or by enabling storage auto-scaling. To learn more, see Disk Space % Used Alert Resolution.

To prevent future write-blocking events, we recommend that you monitor your cluster's disk usage. To learn more, see Preventing Write-Blocking.

Writes have been unblocked on your cluster

Raised when the percentage of used disk on the primary node has fallen below the unblocking thresholds, and Atlas automatically unblocks writes to the cluster.

You can view disk metrics through Real-Time Performance Panel Set. This is an information-only alert that doesn't require any action on your part. However, to prevent future write-blocking events, we recommend that you monitor your cluster's disk usage. To learn more, see Preventing Write-Blocking.

MongoDB Search

The following alert conditions measure the amount of CPU and memory used by MongoDB Search processes. You can view MongoDB Search metrics through cluster monitoring.

Atlas Search: Index Replication Lag is: Raised if the approximate number of milliseconds that MongoDB Search is behind in replicating changes from the oplog of mongod is above or below the threshold.

Atlas Search: Index Size on Disk is: Raised if the total size of all MongoDB Search indexes on disk in bytes is above or below the threshold.

Atlas Search: Mongot paused initial sync: Raised if initial sync is interrupted by the MongoDB Search mongot process due to high disk utilization.

Atlas Search: Mongot is approaching replication stop threshold: Raised if disk usage on a search node is greater than or equal to 85%.

Atlas Search: Max Number of Fields Indexed is: Raised if the maximum number of unique fields (including metadata) found in any single MongoDB Search index on a replica set or shard is above or below the specified number of fields (default is above 1,000 fields).

Atlas Search: Max Number of nGram Fields Indexed is: Raised if the maximum number of fields indexed using nGram or edgeGram tokenizers (including autocomplete type fields and custom analyzers), on a replica set or shard, is above or below the specified number of fields (default is above 100 fields).

Atlas Search: Max Number of Lucene Docs is: Raised if the upper bound on the number of Lucene docs used to store MongoDB Search indexes for a given replica set or shard is above the threshold.

Atlas Search: Mongot stopped replication: Raised if the replication is interrupted by the MongoDB Search mongot process due to high disk utilization.

Atlas Search: Number of Error Queries is: Raised if the number of queries for which MongoDB Search is unable to return a response is above or below the threshold.

Atlas Search: Number of Successful Queries is: Raised if the number of queries for which MongoDB Search successfully returned a response is above or below the threshold.

Atlas Search: Total Number of Queries is: Raised if the number of queries submitted to MongoDB Search is above or below the threshold.

Atlas Search Opcounter: Delete is: Raised if the total number of documents or fields (specified in the index definition) removed per second is above or below the threshold.

Atlas Search Opcounter: Getmore is: Raised if the total number of getmore commands run on all MongoDB Search queries per second is above or below the threshold.

Atlas Search Opcounter: Insert is: Raised if the total number of documents or fields (specified in the index definition) that MongoDB Search indexes per second is above or below the threshold.

Atlas Search Opcounter: Update is: Raised if the total number of documents or fields (specified in the index definition) that MongoDB Search updates per second is above or below the threshold.

Insufficient disk space to support rebuilding search indexes: Raised if your cluster runs out of enough free disk space to support your MongoDB Search indexes.

Search Memory: Resident is: Raised if the total bytes of resident memory occupied by the MongoDB Search process is above or below the threshold.

Search Memory: Shared is: Raised if the total bytes of shared memory occupied by the MongoDB Search process is above or below the threshold.

Search Memory: Virtual is: Raised if the total bytes of virtual memory occupied by the MongoDB Search process is above or below the threshold.

Search Process: CPU (Kernel) % is: Raised if the percentage of time the CPU spent servicing operating system calls for the MongoDB Search process is above the threshold.

Search Process: CPU (User) % is: Raised if the percentage of time the CPU spent servicing the MongoDB Search process is above the threshold.

Search Process: Disk space used is: Raised if the total bytes of disk space used by the MongoDB Search process is above the threshold.
Note
If you apply the condition to all hosts, it applies to dedicated Search Nodes as well.

Search Process: Ran out of memory: Raised if the search process (mongot) runs out of memory. If the search process runs out of memory, indexing and queries fail.

Average Execution Time

The following alert conditions measure the average execution time of reads, writes, or commands for a MongoDB process, as collected from the MongoDB serverStatus command's opLatencies document. You can view asserts through cluster monitoring.

Average Execution Time: Commands is: Average execution time for command operations meets your specified threshold.

Average Execution Time: Reads is: Average execution time for read operations meets your specified threshold.

Average Execution Time: Writes is: Average execution time for write operations meets your specified threshold.

Disk Throughput

The following alert conditions measure the disk read and write throughput for a MongoDB process. You can view these metrics on the Atlas Disk Throughput chart, accessed through cluster monitoring.

Disk read throughput is: Raised if the rate at which data is read from disk in Megabytes per second meets the specified threshold.

Disk write throughput is: Raised if the rate at which data is written to disk in Megabytes per second meets the specified threshold.

Opcounter

The following alert conditions measure the rate of database operations on a MongoDB process since the process last started, as collected from the MongoDB serverStatus command's opcounters document. You can view opcounters through cluster monitoring.

Opcounter: Cmd is: Raised if the rate of commands performed meets the specified threshold.

Opcounter: Delete is: Raised if the rate of deletes meets the specified threshold.

Opcounter: Getmores is: Raised if the rate of getmore operations to retrieve the next cursor batch meets the specified threshold.
Tip
To learn more, see Cursor Batches in the MongoDB manual.

Opcounter: Insert is: Raised if the rate of inserts meets the specified threshold.

Opcounter: Query is: Raised if the rate of queries meets the specified threshold.

Opcounter: Update is: Raised if the rate of updates meets the specified threshold.

Opcounter - Repl

The following alert conditions measure the rate of database operations on MongoDB secondaries, as collected from the MongoDB serverStatus command's opcountersRepl document. You can view these metrics on the Opcounters - Repl chart, accessed through cluster monitoring.

Opcounter: Repl Cmd is: Raised if the rate of replicated commands meets the specified threshold.

Opcounter: Repl Delete is: Raised if the rate of replicated deletes meets the specified threshold.

Opcounter: Repl Insert is: Raised if the rate of replicated inserts meets the specified threshold.

Opcounter: Repl Update is: Raised if the rate of replicated updates meets the specified threshold.

Opcounter: Total is: Raised if the rate of total operations meets the specified threshold.

Operations Scan and Order

You might set alerts for the scan and order operations for a MongoDB process.

Operations: Scan and Order is: Average rate per second over your specified threshold of queries that return sorted results that can't perform the sort operation using an index.
Note
How It's Measured
MongoDB reports on the Replication Oplog using the metrics.operation.scanAndOrder document that the serverStatus command returns.

Atlas Free Clusters

Logical Size is

Raised if the total size of the data and indexes is outside the specified threshold.

Applicable for Atlas Free Clusters Only

Memory

The following alert conditions measure memory for a MongoDB process, as collected from the MongoDB serverStatus command's mem document. You can view these metrics on the Atlas Memory and Non-Mapped Virtual Memory charts, accessed through cluster monitoring.

Memory: Computed is: Raised if the size of virtual memory that is not accounted for by memory-mapping meets the specified threshold. If this number is very high (multiple gigabytes), it indicates that excessive memory is being used outside of memory mapping.
Tip
To learn how to use this metric, view the Non-Mapped Virtual Memory chart and click the chart's i icon.

Memory: Resident is: Raised if the size of the resident memory meets the specified threshold. It is typical over time, on a dedicated database server, for the size of the resident memory to approach the amount of physical RAM on the box.

Memory: Virtual is: Raised if the size of virtual memory for the mongod process meets the specified threshold. You can use this alert to flag excessive memory outside of memory mapping.
Tip
To learn more, click the Memory chart's i icon.

System Memory: Available is: Raised if the amount of available system memory drops below the specified threshold.

System Memory: Max Available is: Raised if the maximum amount of available system memory drops below the specified threshold.

System Memory: Max Used is: Raised if the maximum system memory usage value meets the specified threshold.

System Memory: Used is: Raised if the total system memory used minus buffers, cached, and free memory meets the specified threshold.

Connections

The following alert condition measures connections to a MongoDB process, as collected from the MongoDB serverStatus command's connections document. You can view this metric on the Atlas Connections chart, accessed through cluster monitoring.

Connections is: Raised if the number of active connections to the host meets the specified average.

Connections % of configured limit is: Raised if the number of open connections to the host exceeds the specified percentage.

Queues

The following alert conditions measure operations waiting on locks, as collected from the MongoDB serverStatus command's globalLock document. You can view these metrics on the Atlas Queues chart, accessed through cluster monitoring.

Queues: Readers is: Raised if the number of operations waiting on a read lock meets the specified average.

Queues: Total is: Raised if the number of operations waiting on a lock of any type meets the specified average.

Queues: Writers is: Raised if the number of operations waiting on a write lock meets the specified average.

Page Faults

The following alert condition measures the rate of page faults for a MongoDB process, as collected from the MongoDB serverStatus command's extra_info.page_faults field.

Page Faults is: Raised if the rate of page faults (whether or not an exception is thrown) meets the specified threshold. You can view this metric on the Atlas Page Faults chart, accessed through cluster monitoring.

Database Profiler

The following alert condition applies to the Database Profiler configuration that can significantly impact performance.

Note

This alert applies to the Database Profiler, not the Atlas Query Profiler. The Database Profiler is a MongoDB feature that writes profiling data to the system.profile collection. The Atlas Query Profiler is an Atlas UI feature that analyzes slow queries from your mongod logs.

Profiler configured to capture all operations on a host, which might result in a significant performance impact.

Raised when the Database Profiler is configured to capture all operations on a host. This occurs when the profiler level is set to 2 (profile all operations), or when the profiler level is set to 1 (profile slow operations) with slowms set to 0 or less.

Profiling all operations can cause significant performance degradation and increased disk usage due to writes to the system.profile collection.

This alert is enabled by default for all projects. To resolve this alert, disable the Database Profiler or adjust it to profile only slow operations with an appropriate slowms threshold (typically 100 milliseconds or higher). Use db.setProfilingLevel() to change the profiler settings. To learn more, see Database Profiler.

Cursors

The following alert conditions measure the number of cursors for a MongoDB process, as collected from the MongoDB serverStatus command's metrics.cursor document. You can view these metrics on the Atlas Cursors chart, accessed through cluster monitoring.

Cursors: Open is: Raised if the number of cursors the server is maintaining for clients meets the specified average.

Cursors: Timed Out is: Raised if the number of timed-out cursors the server is maintaining for clients meets the specified average.

Network

The following alert conditions measure throughput for MongoDB process, as collected from the MongoDB serverStatus command's network document. You can view these metrics on a host's Network chart, accessed through cluster monitoring.

Network: Bytes In is: Raised if the number of bytes sent to MongoDB meets the specified threshold.

Network: Bytes Out is: Raised if the number of bytes sent from MongoDB meets the specified threshold.

Network: Num Requests is: Raised if the number of requests sent to MongoDB meets the specified average.

Replication Oplog

The following alert conditions apply to the MongoDB process's oplog. You can view these metrics on the following charts, accessed through cluster monitoring:

Oplog GB/Hour
Replication Headroom
Replication Lag
Replication Oplog Window

The following alert conditions apply to the oplog:

Oplog Data Per Hour is: Raised when the amount of data per hour being written to a primary's oplog meets the specified threshold.

Replication Headroom is: Raised when the difference between the sync source member's oplog window and the replication lag time on the secondary meets the specified threshold.

Replication Lag is: Raised if the approximate amount of time that the secondary is behind the primary meets the specified threshold. Atlas calculates replication lag using the approach described in Check the Replication Lag in the MongoDB manual.

Replication Oplog Window is: Raised if the approximate amount of time available in the primary's replication oplog meets the specified threshold.

DB Storage

The following alert conditions apply to database storage, as collected for a MongoDB process by the MongoDB dbStats command. For details on how Atlas handles reaching database storage limits, refer to the FAQ page. These conditions are based on the summed total of all databases on the MongoDB process:

Note

Atlas retrieves database metrics every 20 minutes by default but adjusts frequency when necessary to reduce the impact on database performance.

DB Data Size is: Raised if approximate size of all documents (and their paddings) meets the specified threshold.

DB Storage is: Raised if the allocated storage meets the specified threshold. This alert condition can be viewed on a host's DB Storage chart, accessed through cluster monitoring.

Namespaces

The following alert condition applies to the total number of namespaces across all non-system databases in a MongoDB process. You can view how many namespaces are in use for a MongoDB process by viewing the the Catalog cluster metric.

Total Namespaces is

Raised if the total number of collections and indexes across all non-system databases meets the specified threshold.

Monitor this metric to avoid exceeding namespace limits, which can impact database performance and operations. For guidance on managing collections and indexes, see the Data Modeling Introduction and Reduce Number of Collections.

WiredTiger Storage Engine

The following alert conditions apply to the MongoDB process's WiredTiger storage engine, as collected from the MongoDB serverStatus command's wiredTiger.cache and queues.execution documents.

You can view these metrics on the following charts, accessed through cluster monitoring:

Cache Activity
Cache Usage
Tickets Available
Cache Fill Ratio

The following are the alert conditions that apply to WiredTiger:

Cache: Bytes Read Into Cache is: Raised when the number of bytes read into the WiredTiger cache meets the specified threshold.

Cache: Bytes Written From Cache is: Raised when the number of bytes written from the WiredTiger cache meets the specified threshold.

Cache: Dirty Bytes is: Raised when the number of dirty bytes in the WiredTiger cache meets the specified threshold.

Cache: Used Bytes is: Raised when the number of used bytes in the WiredTiger cache meets the specified threshold.

Cache: Fill Ratio is: Raised if the percentage of bytes in the cache relative to the total cache size meets the specified threshold.

Cache: Dirty Fill Ratio is: Raised if the percentage of dirty bytes relative to the total cache size meets the specified threshold.

Tickets Available: Reads is: Raised if the number of read tickets available to the WiredTiger storage engine meet the specified threshold.

Tickets Available: Writes is: Raised if the number of write tickets available to the WiredTiger storage engine meet the specified threshold.

For clusters running on MongoDB version 7.0 and later, don't use the number of tickets as a metric for overload alerts. Starting in MongoDB version 7.0, Atlas dynamically adjusts the number of tickets. Instead, use the number of queued readers and writers as an overload metric.

System and Disk Alerts

The following alert conditions measure usage on your Atlas server clusters:

Note

Currently, Atlas uses a single partition for data, index, and journal files. Even though the alerts reference individual partitions, they point to the same metric.

Note

All hardware metrics have burst reporting equivalents with distinct configurable alerts. To learn more, see Burst Reporting.

Disk Queue depth on Data Partition is: Raised if the average length of the queue of requests issued to the data partition that MongoDB uses exceeds the specified threshold.

Disk read IOPS on Data Partition is: Raised if the average number of disk read operations per second exceeds the specified threshold.

Disk read latency on Data Partition is: Raised if the amount of latency on disk read operations exceeds the specified threshold.

Disk space % used on Data Partition is

The percentage of disk space used on any partition that contains the MongoDB collection's data.

To find possible solutions for this alert, see Alert Resolutions.

Disk write IOPS on Data Partition is: Raised if the average number of disk write operations per second exceeds the specified threshold.

Disk write latency on Data Partition is: Raised if the amount of latency on disk write operations exceeds the specified threshold.

Max disk queue depth on Data Partition is: Raised if the maximum average length of the queue of requests issued to the data partition that MongoDB uses exceeds the specified threshold.

Max disk read IOPS on Data Partition is: Raised if the maximum average number of disk read operations per second exceeds the specified threshold.

Max disk read latency on Data Partition is: Raised if the maximum amount of latency on disk read operations exceeds the specified threshold.

Max disk space % used on Data Partition is: Raised if the maximum percentage of disk space used on any partition that contains the MongoDB collection's data exceeds the specified threshold.

Max disk write IOPS on Data Partition is: Raised if the maximum average number of disk write operations per second exceeds the specified threshold.

Max disk write latency on Data Partition is: Raised if the maximum amount of latency on disk write operations exceeds the specified threshold.

Max System Network In is: Raised if the maximum number of bytes sent to MongoDB meets the specified threshold.

Max System Network Out is: Raised if the maximum number of bytes sent from MongoDB meets the specified threshold.

System: CPU (Steal) % is

Applicable when the EC2 cluster credit balance is exhausted.

The percentage by which the CPU usage exceeds the guaranteed baseline CPU credit accumulation rate. CPU credits are units of CPU utilization that you accumulate. The credits accumulate at a constant rate to provide a guaranteed level of performance. These credits can be used for additional CPU performance. When the credit balance is exhausted, only the guaranteed baseline of CPU performance is provided, and the amount of excess is shown as steal percent.

Note

Atlas triggers this alert only for AWS EC2 clusters that support AWS burstable instances. Currently, these are M10 and M20 cluster types. To learn how burstable performance affects auto-scaling thresholds, see How Atlas Scales Cluster Tier.

System: CPU (User) % is: The CPU usage of the processes on the node, normalized by the number of CPUs. This value is scaled to a range of 0-100%.

System: Max CPU (Steal) % is: Raised if the maximum percentage by which the CPU usage exceeds the guaranteed baseline CPU credit accumulation rate exceeds the specified threshold.

System: Max CPU (User) % is: Raised if the maximum CPU usage of the processes on the node, normalized by the number of CPUs, exceeds the specified threshold.

System Network In is: Raised if the average rate of physical bytes received per second by the eth0 network interface reaches the specified threshold.

System Network Out is: Raised if the average rate of physical bytes transmitted per second by the eth0 network interface reaches the specified threshold.

Restarts

Restarts in Last Hour is: Raised if the number of times a host restarts within the previous hour exceeds the specified threshold.

Host Down

Host is Down

Raised if Atlas is unable to reach a host for several minutes.

Important

You should only configure this alert if you depend on secondary reads. For more information on secondary reads, see Query using Pre-Defined Replica Set Tags and Read Preference.

This alert is generally triggered by one of the following conditions:

The cluster has experienced a failure and is being auto-healed.
The cluster could not be reached because of a network issue.

MongoDB Atlas checks that the downtime did not occur because of your actions, such as rolling index builds. If MongoDB Atlas confirms that the downtime was not intentional, MongoDB Atlas attempts to replace the affected node. If failures occur, Atlas clusters maintain node availability for both reads and writes as long as a majority of nodes are running. To learn more, see How does MongoDB Atlas deliver high availability?.

Push-Based Log Export (PBLE)

The following alert conditions apply to the Push-Based Log Export feature:

Push based log export is unable to push logs on this host: Raised if the log exporter is unable to send logs for an extended period.

Push based log export has dropped a log line: Raised if the log exporter drops a log line. This might indicate a very large log line that couldn't be sent.

Swap

The following alert conditions apply to swap space usage:

Swap Usage: Free is: Raised if the amount of available swap space drops below the specified threshold.

Swap Usage: Max Free is: Raised if the maximum amount of available swap space drops below the specified threshold.

Swap Usage: Max Used is: Raised if the maximum total amount of swap space in use reaches the specified threshold.

Swap Usage: Used is: Raised if the total amount of swap space in use reaches the specified threshold.

Sort

The following alert conditions apply to sort operations:

Sort: Spill to disk during sort is: Raised if the number of writes to disk caused by $sort stages meets the specified threshold.

Inapplicable Host Conditions

The following host conditions do not apply to Atlas. Atlas will not generate alerts for the following conditions:

Accesses Not In Memory: Total is
Background Flush Average is
B-tree: accesses is
B-tree: hits is
B-tree: misses is
B-tree: miss ratio is
Cursors: Client Cursors Size is
Effective Lock % is
Journaling Commits in Write Lock is
Journaling MB is
Journaling Write Data Files MB is
Memory: Mapped is
Page Fault Exceptions Thrown: Total is

Query Targeting Alerts

The following alerts apply to indexes on your collections. Either alert might indicate a missing or inefficient index.

Tip

To learn more about indexing to improve performance, see Indexing Strategies.

Query Targeting: Scanned / Returned: Raised if the ratio of index keys scanned to documents returned meets or exceeds the specified threshold.

Query Targeting: Scanned Objects / Returned: Raised if the ratio of documents scanned to documents returned meets or exceeds the specified threshold.

The change streams cursors that the MongoDB Search process (mongot) uses to keep MongoDB Search indexes updated can contribute to the query targeting ratio and trigger query targeting alerts if the ratio is high.

Cloud Backup Alerts

The following alerts apply to Cloud Backup snapshots.

Backup restore failed: Raised when a restore fails.

Backup restore succeeded: Raised when a restore succeeds.

Fallback snapshot failed: Raised when a fallback snapshot fails.

Fallback snapshot taken: Raised when a regular backup fails, but Atlas was able to take a fallback snapshot.
Tip
Fallback Snapshots.

Last snapshot too old: Raised when too much time has passed since the last successful snapshot.

Snapshot download request failed: Raised when a download request fails.

Snapshot schedule fell behind: Raised when a snapshot hasn't been taken over configured period.

Snapshot taken successfully: Raised when a snapshot was taken successfully.

Replica Set Alerts

The following alert conditions apply to replica sets:

Number of elections in last hour is > X: Raised when the number of elections that have occurred in the last hour exceeds the user-specified value of X. The value of X is set when you create the alert. This alert might indicate that the cluster's replication is not in a healthy state, as evidenced by constant elections.

Replica set elected a new primary: Raised when a replica set elects a new primary.

Replica set has no primary

Raised when a replica set does not have a primary. Specifically, when none of the members of a replica set have a status of PRIMARY, the alert triggers. For example, this condition might arise when a set has an even number of voting members resulting in a tie.

If Atlas collects data during an election, this alert might send a false positive. To prevent such false positives, set the alert configuration's after waiting interval (in the configuration's Send to section).

To find possible solutions for this alert, see Alert Resolutions.

Sharded Cluster Alerts

The following alert condition applies to sharded clusters:

Cluster is missing an active mongos: Raised if Atlas cannot reach any mongos for the cluster.

Flex Alerts

The following alert conditions apply to Flex clusters:

Flex metric outside threshold

Raised if any of the following conditions apply:

The number of open connections to the host exceeds 80% of the total open connections allowed.
The approximate size of all documents (and their paddings) and the index exceeds 4 gigabytes.
The total operations per second exceeds 200 for 24 hours and realerts every 6 hours.

App Services Alerts

The following alert conditions apply to Atlas App Services.

An overall request rate limit has been hit: Raised when the number of concurrent requests exceeds the limit. This alert indicates that an app might be making an unexpectedly high number of requests.

Auth Login Fail is: Raised if the number of failed client login requests per second meets the specified threshold.

Endpoints Compute Time is: Raised if the HTTPS endpoints compute time per second meets the specified threshold.

Endpoints Egress Bytes is: Raised if the HTTPS endpoints data egress bytes per second meets the specified threshold.

Failed Requests - Endpoints is: Raised if the number of HTTPS endpoints requests that fail per second meets the specified threshold.

Failed Requests - GraphQL is: Raised if the number of GraphQL requests that fail per second meets the specified threshold. (GraphQL support for Atlas App Services is deprecated. To learn more, see the Atlas App Services documentation.)

Failed Requests - Overall is: Raised if the number of total requests that fail per second meets the specified threshold.

Failed Requests - SDK (Functions) is: Raised if the number of SDK Function requests that fail per second meets the specified threshold.

Failed Requests - Sync is: Raised if the number of failed Atlas Device Sync requests per second meets the specified threshold.

Failed Requests - Triggers is: Raised if the number of Triggers requests that fail per second meets the specified threshold.

GraphQL Compute Time is: Raised if the GraphQL compute time per second meets the specified threshold. (GraphQL support for Atlas App Services is deprecated. To learn more, see the Atlas App Services documentation.)

GraphQL Egress Bytes is: Raised if the GraphQL data egress bytes per second meets the specified threshold. (GraphQL support for Atlas App Services is deprecated. To learn more, see the Atlas App Services documentation.)

GraphQL Request Duration P95 is: Raised if the 95th percentile of duration in milliseconds for GraphQL requests meets the specified threshold. (GraphQL support for Atlas App Services is deprecated. To learn more, see the Atlas App Services documentation.)

HTTP Endpoint Request Duration P95 is: Raised if the 95th percentile of duration in milliseconds for HTTPS endpoint requests meets the specified threshold.

MQL Request Duration P95 is: Raised if the 95th percentile of duration in milliseconds for MQL requests meets the specified threshold.

Overall Compute Time is: Raised if the overall compute time per second meets the specified threshold.

Overall Egress Bytes is: Raised if the overall data egress bytes per second meets the specified threshold.

SDK Functions Compute Time is: Raised if the SDK Functions compute time per second meets the specified threshold.

SDK Functions Egress Bytes is: Raised if the SDK Functions data egress bytes per second meets the specified threshold.

SDK Functions Request Duration P95 is: Raised if the 95th percentile of duration in milliseconds for SDK function requests meets the specified threshold.

SDK MQL Compute Time is: Raised if the SDK MQL compute time per second meets the specified threshold.

SDK MQL Egress Bytes is: Raised if the SDK MQL data egress bytes per second meets the specified threshold.

Session Ended - Sync is: Raised if the number of sessions ended per second during Atlas Device Sync meets the specified threshold.

Sync Client Bootstrap Time is: Raised if the 95th percentile of the bootstrap time for the Atlas Device Sync client meets the specified threshold.

Sync Client Uploads that failed is: Raised if the number of uploads that failed per second on the Atlas Device Sync client meets the specified threshold.

Sync Client Uploads that are invalid: Raised if the number of invalid uploads per second on the Atlas Device Sync client meets the specified threshold.

Sync Current Oplog Lag Sum is: Raised if the approximate amount of time that the Atlas Device Sync is behind the MongoDB oplog meets the specified threshold.

Sync Egress Bytes is: Raised if the Atlas Device Sync data egress bytes per second meets the specified threshold.

Sync Num Unsyncable Docs % is: Raised is the number of App Services unsyncable documents meets the specified threshold.

Triggers Compute Time is: Raised if the triggers compute time per second has meets the specified threshold.

Triggers Current Oplog Lag Sum is: Raised if the approximate amount of time that the App Services triggers is behind the MongoDB oplog meets the specified threshold.

Triggers Egress Bytes is: Raised if the triggers data egress bytes per second meets the specified threshold.

Triggers Request Duration P95 is: Raised if the 95th percentile of duration in milliseconds for triggers meets the specified threshold.

User Alerts

The following alert conditions apply to Atlas users.

Organization users do not have multi-factor authentication enabled: Raised when one or more users in an organization do not have multi-factor authentication enabled.

User had their role changed: Raised when an Atlas user's project or organization roles have changed.

User joined the organization: Raised when a new user joins the Atlas organization.

User joined the project: Raised when a new user joins the Atlas project.

User left the organization: Raised when a user leaves the Atlas organization.

User left the project: Raised when a user leaves the Atlas project.

Project Alerts

The following alert conditions apply to your Atlas project.

Security checkup alerts updated: Raised if the project's or organization's security checkup alerts are updated.

Encryption at Rest KMS network access denied: Raised if the project in Atlas can't connect to your key management provider. In this case, Atlas doesn't shut down your processes. This alert runs automatically for all new projects to communicate any KMS network access failures. To learn more, see Enable Encryption at Rest with KMS.

Tag(s) were added or modified on project: Raised if you or your team have added or changed project's tags.

Users do not have multi-factor authentication enabled: Raised if the project or organization has users who have not set up multi-factor authentication.

Billing Alerts

The following alert conditions apply to Atlas billing. You can configure billing alerts from the Atlas UI at the organization level or the project level.

To configure organization-level alerts:

In Atlas, go to the Organization Alerts page.

If it's not already displayed, select your desired organization from the Organizations menu menu in the navigation bar.
Click the Alerts icon in the navigation bar.
Click Alerts under the Organization header.

The Organization Alerts page displays.

Configure the alerts.

To configure project-level alerts:

In Atlas, go to the Project Alerts page.

If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
Click the Alerts icon in the navigation bar.
Click Alerts under the Project header.

The Project Alerts page displays.

Configure the alerts.

Note

All amounts billed are in USD.

Amount billed ($) yesterday is above the threshold

Raised if the organization or project's last daily amount billed exceeds your configured threshold. Atlas does not account for any credits applied for the previous day when calculating the billed amount.

This condition applies to both organizations and projects.

Credit card is about to expire

Raised if the credit card on file is about to expire. The alert is triggered at the beginning of the month that the card expires. Atlas enables this alert when a credit card is added for the first time.

This condition applies to both organizations and projects.

Current bill ($) for any single project is above the threshold

Raised if the monthly total for any project within the organization exceeds your configured threshold for all projects. When the current pending invoice closes, this alert resets.

This alert condition applies to organizations only.

Current bill ($) for the organization is above the threshold

Raised if the monthly total for the organization exceeds your configured threshold. When the current pending invoice closes, this alert resets.

This alert condition applies to organizations only.

Service Account Alerts

The following alert conditions apply to Atlas service accounts. You can configure these alerts from the Atlas UI at the organization level.

Service Account Secrets are about to expire

Raised if a secret for any of your service accounts expires within seven days, or the number of days you specify if you configure this alert. When all expiring secrets are removed or have expired, this alert resets.

This alert condition applies to organizations only.

Service Account Secrets have expired

Raised if a secret for any of your service accounts has expired. To generate a new secret, see Update Programmatic Access to an Organization. When all expired secrets are removed, this alert resets.

This alert condition applies to organizations only.

Federation Alerts

Organization's IdP certificate is about to expire: Raised when an IdP certificate associated with an organization for which you have the Organization Owner role expires within 14 days. Atlas sends this alert daily until you acknowledge it.
Note
Atlas creates this alert automatically when you map an organization to an IdP provider. If you remove the mapping, Atlas deletes all instances of this alert.

Encryption at Rest Alerts

The following alert conditions apply to projects using Encryption at Rest using Customer Key Management.

AWS encryption key elapsed time since last rotation is above (n) days

Raised if the AWS Customer Master Key (CMK) used by the Atlas project has been active for more than the configured number of days (90 by default).

To modify the alert threshold:

In Atlas, go to the Project Alerts page.
1. If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
3. Click the Alerts icon in the navigation bar.
4. Click Alerts under the Project header.
The Project Alerts page displays.
Click the Alert Settings.

If you configure the default 90 days alert to be greater than the AWS KMS CMK rotation, Atlas won't create the alert because AWS would have automatically rotated your CMK.

This alert resets automatically if you rotate the project CMK. For documentation on how to rotate your project CMK, see Rotate your AWS Customer Master Key.

Azure encryption key elapsed time since last rotation is above (n) days

Raised if the Azure Key Vault Key Identifier used by the Atlas project has been active for more than the configured number of days (90 by default).

To modify the alert threshold:

In Atlas, go to the Project Alerts page.
1. If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
3. Click the Alerts icon in the navigation bar.
4. Click Alerts under the Project header.
The Project Alerts page displays.
Click the Alert Settings.

This alert resets automatically if you rotate the project Key Identifier. For documentation on how to rotate your project Key Identifier, see About Rotating Your Azure Key Identifier.

GCP encryption key elapsed time since last rotation is above (n) days

Raised if the GCP Key Version Resource ID used by the Atlas project has been active for more than the configured number of days (90 by default).

To modify the alert threshold:

In Atlas, go to the Project Alerts page.
1. If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
3. Click the Alerts icon in the navigation bar.
4. Click Alerts under the Project header.
The Project Alerts page displays.
Click the Alert Settings.

This alert resets automatically if you rotate the project Key Version Resource ID.

To learn how to rotate your project Key Version Resource ID, see Rotate your GCP Key Version Resource ID.

Encryption at Rest KMS network access denied

Raised if the KMS credentials for your cloud provider are invalid due to network access restrictions. This alert runs automatically for all new projects to communicate any KMS network access failures.

To modify or remove the alert:

In Atlas, go to the Project Alerts page.
1. If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.
3. Click the Alerts icon in the navigation bar.
4. Click Alerts under the Project header.
The Project Alerts page displays.
Click the Alert Settings.

This alert is enabled by default for all new projects.

Maintenance Window Alerts

The following alert conditions apply to projects with configured maintenance windows.

Note

You can only configure maintenance window alerts if a project has an active maintenance window.

Maintenance is scheduled: Raised 72 hours prior to scheduled maintenance for a project.

Maintenance no longer needed: Raised if scheduled maintenance is no longer needed for a project.

Maintenance started: Raised when maintenance starts for a project.

Maintenance has been auto-deferred: Raised if maintenance has been deferred.

MongoDB Support Access Grant Alerts

Granted additional access to MongoDB support: Raised when MongoDB support staff has infrastructural access. You can view the access grant type and the expiration date of the granted event.

Revoked additional access from MongoDB support: Raised when MongoDB support staff no longer has infrastructural access. You can view the access grant type.

Atlas Stream Processing Alerts

The following alert conditions apply to projects running Stream Processing Workspaces.

Stream Processor State is failed: Raised if a target stream processor exits with a failed state.
Note
If you change the name of a stream processor for which you had configured the Stream Processor State is failed alert by using an Operator (which contains matcher expressions like is, contains, and more), Atlas won't trigger alerts for the renamed stream processor if the matcher expression doesn't match the new name. To monitor the renamed stream processor, reconfigure the alert.

Stream Processor source change stream lag is: Raised if the lag time between an event on a change stream source and the ingestion time of that event on a target stream processor is above or below the threshold.

Stream Processor DLQ message count is: Raised if the rate of messages per second the target stream processor writes to the Dead Letter Queue is above or below the threshold.

Stream Processor source Kafka offset lag is: Raised if the offset lag total on a Kafka source is above or below the threshold.

Stream processor output message count is: Raised if the rate of messages per second the target stream outputs through its $emit or $merge stage is above or below the threshold.

Back

Alert Basics

Configure Alert Settings

Review Alert Conditions

Note

Host Alerts

Advisor

Asserts

Auto-Scaling

Write-Blocking

MongoDB Search

Note

Average Execution Time

Disk Throughput

Opcounter

Tip

Opcounter - Repl

Operations Scan and Order

Note

How It's Measured

Atlas Free Clusters

Memory

Tip

Tip

Connections

Queues

Page Faults

Database Profiler

Note

Cursors

Network

Replication Oplog

DB Storage

Note

Namespaces

WiredTiger Storage Engine

System and Disk Alerts

Note

Note

Note

Restarts

Host Down

Important

Push-Based Log Export (PBLE)

Swap

Sort

Inapplicable Host Conditions

Query Targeting Alerts

Tip

Cloud Backup Alerts

Tip

Replica Set Alerts

Sharded Cluster Alerts

Flex Alerts

App Services Alerts

User Alerts

Project Alerts

Billing Alerts

In Atlas, go to the Organization Alerts page.

Configure the alerts.

In Atlas, go to the Project Alerts page.

Configure the alerts.

Note

Service Account Alerts

Federation Alerts

Note

Encryption at Rest Alerts

Maintenance Window Alerts

Note

MongoDB Support Access Grant Alerts

Atlas Stream Processing Alerts

Note