Adaptive Operation Rate Limiting

The Adaptive Rate Limiter for Operations is an Intelligent Workload Management (IWM) policy in Atlas. IWM is a dynamic resource manager that provides real-time workload monitoring and automated safeguards to maintain high availability under load. The Adaptive Rate Limiter for Operations dynamically adjusts the rate at which MongoDB accepts and rejects operations that your applications send to a mongod when a cluster is overloaded.

MongoDB considers a node overloaded when the number of incoming operations is large enough to cause total or near-total outage. MongoDB computes overload from metrics like CPU utilization, queue depth, operations per second, and latency.

Important

This policy is a load-shedding policy. If this policy is active on your Atlas cluster and your cluster is overloaded, you might see its associated overload errors.

When traffic suddenly spikes, accepting more operations than your node's maximum load can overwhelm your cluster, causing degraded performance, timeouts, and potential failovers. The cluster can take significant time to recover.

The Adaptive Rate Limiter for Operations policy prevents overload by:

Limiting the admission rate to what the system can safely handle
Maintaining cluster stability and avoiding outages
Keeping a portion of operations succeeding with predictable latency
Enabling faster recovery from traffic spikes

Considerations

Your Atlas cluster must be running MongoDB 8.3 or later to use this policy. On MongoDB 8.3, this policy is disabled by default. To enable or disable IWM policies, see the IWM settings.
This policy is available only for M10+ Atlas replica set clusters.
This policy is not available on sharded clusters.

Behavior

When Atlas runs the Adaptive Rate Limiter for Operations policy on your cluster, it performs the following actions:

Monitors for overload
- Atlas continuously evaluates indicators of overload on each node.
- When Atlas detects overload conditions, the Adaptive Rate Limiter for Operations policy activates. Atlas triggers an alert for the following alert condition:
  Atlas is actively regulating the admission rate of new operations in order to safeguard cluster stability as CPU pressure has exceeded <N>% and high operation latency has been detected.
  To modify your project's alert settings, see Configure an Alert.
Determines a safe admission rate
- When the system approaches overload, Atlas computes a maximum safe rate at which it can admit new operations on each node, based on recent conditions.
Admits or rejects operations at the entry point
- MongoDB admits and runs as usual any operations that arrive within the safe rate.
- MongoDB immediately rejects any operations that arrive above the safe rate for mongod on each node. MongoDB doesn't queue these operations until they time out.
Adapts the admission rate over time
- As load decreases and the cluster recovers, Atlas relaxes the rate limit so that it can admit more operations again.
- When the policy is no longer active, the following informational event appears in the cluster's activity feed:
  "Atlas is no longer regulating the admission rate of new operations."
  To learn more, see the IWM activity feed events.

When the policy is active, some operations in your application fail quickly with an error containing the SystemOverloadedError label. Other operations continue to succeed. This prevents a situation where all operation time out, causing a node crash. To learn more about how to catch overload errors and avoid retry storms, see Overload Errors.

Observability

You can use the following methods to track how the Adaptive Rate Limiter for Operations is affecting your workload:

Monitor Cluster Metrics: Operation throttling metrics show the number of operations that IWM policies have terminated.
Configure Alerts:
- Cluster overload conditions trigger default alerts for Intelligent Workload Management alert conditions. To learn how to manage alerts, see Configure Alert Settings.
- When cluster overload conditions resolve, Atlas writes informational events to the activity feed that indicate the resolution of IWM policies. To learn more, see IWM activity feed events.

Back

Reliability, Availability, and Workload Management

Query Sentinel