How to Monitor MongoDB

Get Started with Atlas

MongoDB provides monitoring and tools to observe and address the performance overall health of your database instances. Read on to understand the metrics and tools you can use to monitor your clusters.

In what follows, we’ll guide you through:

Why it’s important to monitor MongoDB
Key areas to prioritize
Additional important metrics include a detailed list of metrics, their definitions, importance, and criteria for good and bad signals, categorized by:
MongoDB performance monitoring tools
An appendix including the full list of metrics and tools

Why monitor MongoDB?

A key aspect of database administration and capacity planning for your application is monitoring your clusters’ health and performance. An unhealthy database may experience slow response times or become overwhelmed and introduce the risk of downtime for your application. While MongoDB Atlas, our fully managed cloud database, handles the majority of administration efforts and has built-in fault tolerance and scaling abilities, it’s still crucial that users know how to best monitor their clusters.

Monitoring MongoDB databases allows you to improve the performance of your application stack and optimize for costs by enabling you to:

Understand the current capacity of your database
Observe how utilized resources are
Observe the presence of abnormal behavior and performance issues
Detect and react to real-time issues

Good or bad signals for your cluster are largely relative to your baseline cluster activity and whether you are experiencing predictable or abnormal behavior. Please note that not all clusters behave in the same way, so use this information as a guide to help identify what to look out for.

Top 7 areas to monitor in MongoDB

These are seven key monitoring metrics and capabilities to leverage in MongoDB, in no particular order.

1. Scan and order

What is scan and order?

The scan and order metric is the average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.

Why is it important?

In-memory sorts can be very expensive as they require large result sets to be reordered at runtime. They can sometimes be avoided by using indexes that presort.

What should I look out for?

The ideal state to look for with scan and order is a value of 0, meaning that the database didn’t perform any in-memory sort operations.

Generally, any spike in scan and order operations is a potential cause for further investigation. This indicates that the database performed many in-memory sort operations, which requires memory and computational overhead to process a query. It also indicates a blocking stage for aggregation operations—further processing cannot happen until the results have been sorted.

Remember, not all queries are equal: One large scan and order can be worse than many small ones.

2. Query targeting

What is query targeting?

The query targeting metric is the ratio of documents examined relative to the number of documents returned across all operations during a sampling period.

Why is it important?

This metric is a good indicator of how efficiently the database is running.

What should I look out for?

Ideally, query targeting should stay as low as possible. A value of 1 means that for every document returned, a single document was examined, which is generally indicative of relatively healthy operations.

Spikes in query targeting typically arise when there are indexing opportunities. For example, a large collection being regularly queried without the use of an index (collections scan), or with an overly simplistic index, can result in highly inflated query targeting metrics.

You can set an alert for this metric under Project Alerts in your settings.

3. Normalized System CPU

What is normalized system CPU?

The normalized system CPU metric shows the CPU usage of all processes on the underlying instance, scaled to a range of 0-100% by dividing the absolute value by the number of CPU cores.

Why is it important?

Operating on a suboptimal tier could result in higher costs if overprovisioned or potential downtime due to a lack of resources if underprovisioned. Selecting the appropriate tier will help ensure adequate performance at a minimum cost, assuming generally efficient use of computational resources.

What should I look out for?

A healthy range for sustained normalized system CPU is often between 40% and 70%.

Under 40% may indicate potential overprovisioning, while over 70% may indicate potential underprovisioning.

4. Performance Advisor

What is the Performance Advisor?

The Performance Advisor is an Atlas feature that provides targeted insights and recommendations based on the analysis of logged query patterns and existing indexes across the entire database cluster.

Why is it important?

By providing indexing recommendations, the Performance Advisor can help improve application performance and long-term scalability, leading to a more cost-effective system.

What should I look out for?

The Performance Advisor ranks suggested indexes according to their relative impact, which indicates high or medium based on potential efficiency gains.

Each suggestion contains the following metrics, which apply specifically to the logged queries that would be improved by the index:

Execution Count
Average Execution Time
Average Query Targeting
Average Docs Scanned
Average Docs Returned

The Performance Advisor also shows an executed sample query that matches the query shape, with specific metrics for that query.

Please note: always verify the index recommendations before creating. Additional indexes incur write overhead and storage space. Hide indexes before dropping them.

5. Namespace Insights

What is Namespace Insights?

Namespace Insights tracks collection-level query latency in MongoDB Atlas, offering visibility into latency metrics and statistics for specific hosts and operation types (all operation types, reads, writes, and commands). Users can manage pinned namespaces (i.e. collections, in this context) and select up to five to display in the query latency charts.

The available metrics include:

Total latency
Average latency
P50 latency (50th percentile in the latency histogram)
P95 latency (95th percentile in the latency histogram)
P99 latency (99th percentile in the latency histogram)
Operation count

Why is it important?

Namespace Insights is helpful for identifying performance bottlenecks at the collection level. By tracking query latency metrics, database administrators can quickly determine which collections are exhibiting the most relative latency.

What should I look out for?

Use Namespace Insights to quickly identify outliers and anomalies in performance relative to the cluster’s baseline.

6. Query Profiler

What is the Query Profiler?

The Query Profiler provides a visual representation of logged operations for Atlas clusters. The Query Profiler captures up to the most recent 100,000 operations.

There are seven views of the Query Profiler:

Operation Execution Time (default)
Keys Examined
Docs Returned
Examined:Returned Ratio
Docs Examined
Num Yields
Response Length

Why is it important?

The Query Profiler provides detailed insights into query performance, helping identify inefficiencies and bottlenecks. By analyzing logged operations, it empowers database administrators to optimize queries and improve overall system performance.

What should I look out for?

Issues in the Query Profiler can be indicated by high execution times, excessive document examination, and poor index usage.

The Query Profiler dashboard provides a high-level view that makes it easy to quickly identify outliers and general trends. The table offers operation statistics by namespace (database and collection) and operation type.

7. Billing Cost Explorer

What is the Billing Cost Explorer?

The Billing Cost Explorer enables users to track and analyze their MongoDB Atlas spending. It provides insights into resource consumption and associated costs, displaying metrics such as total spend and cost trends over time.

Why is it important?

The Billing Cost Explorer is helpful for managing MongoDB Atlas expenses effectively. By understanding where costs are incurred, users can optimize resource usage, identify potential savings, and ensure that database operations remain within budget, supporting financial planning.

What should I look out for?

When using the Billing Cost Explorer, pay attention to spikes in spending, especially during high resource usage periods. Monitoring cost trends can help identify inefficient resource allocations or underutilized clusters, revealing opportunities for optimization and better alignment with usage patterns.

Additional important metrics

Below, we’ve identified a number of important metrics used to measure performance, categorized by instance status and health, cluster operation and connection metrics, instance hardware metrics, and replication metrics. We’ve included some general recommendations, but it’s important to account for your specific use case and requirements to determine what values are best in your particular context.

Instance status and health

The status of a MongoDB server process can be an indication of whether we need to drill down into its activity or health. A process that is unresponsive or does not answer to our commands should be immediately investigated.

Monitor with MongoDB Atlas: Cluster health and process health can be seen via the Cluster view. Green dots means a healthy state, while orange and red mean there are issues with the process.
Monitor self-managed MongoDB instances: Commands such as rs.status() for replica sets and sh.status() for sharded clusters provide a high level status of the cluster.

Cluster operation and connection metrics

When your application is struggling or underperforming, you may want to investigate the database layer as a potential bottleneck. The application establishes connections and performs operations against the database, so pay close attention to its behavior.

MongoDB provides various metrics and mechanisms to identify its connection and operation patterns. On top of the active and proactive monitoring tools, Atlas provides a full alerting system and log gathering.

Monitor with MongoDB Atlas: Atlas provides built-in features like Performance Advisor, Real-Time Performance Panel, Namespace Insights, and Query Profiler tto track operations and highlight slow and heavy spotted operations. Additionally, the Metrics tab provides many graphs that plot operations and number of connections. See below for more details:

Metric	Definition	Importance	Signals
Opcounters	Tracks the number and type of operations performed against the database, including inserts, updates, deletes, and queries.	Provides insight into the overall workload of the database, helping to identify bottlenecks or performance issues.	Good: Predictable activity based on application usage. Bad: Unexpected changes in behavior or sudden spikes/drops in normal activity will often prompt further investigation.
Operation Execution Time	Average time taken to execute database operations, measured in milliseconds.	A performance indicator–longer execution times can lead to slower application response.	Good: Low and stable execution times. Bad: Increasing execution times may signal performance degradation, potentially due to resource contention or inefficient queries.
Query Targeting	Ratio of documents examined relative to the number of documents returned across all operations during a sampling period.	An overall measure on how efficiently the database is running.	Good: For frequently run queries, aim for as low a value as possible. Bad: Spikes or sustained levels of high query targeting most often means there are opportunities to improve query efficiency.
Connections	Total number of active connections to the database at any given time.	Connections are a finite resource and should be used efficiently.	Good: Predictable and stable connection counts over time, within cluster tier limits. Bad: Higher connection counts than the application requires or trends toward cluster tier limits may lead to unnecessary resource consumption or the inability to establish new connections.
Queues	Number of operations waiting to be processed by the database, indicating the level of demand versus capacity.	Identify potential bottlenecks and ensure the database can handle incoming requests efficiently.	Good: No queues indicate that the database is processing requests promptly. Bad: Queues suggest that the database is not able to process operations at the rate they are being issued, leading to increased latency and potential timeouts.
Scan And Order	Average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.	In-memory sorts can be very expensive as they require large result sets to be computationally sorted at runtime.	Good: 0, database didn't perform any in-memory sort operations. Bad: a large value indicates the database performed many in-memory sort operations.

Monitor with self-managed MongoDB instances:
- You can leverage tools like mongostat and mongotop.
- Once you connect via Compass to your instance, you can use the MongoDB Compass Performance Tab, which is similar to Atlas Real-Time Performance Panel.

Instance hardware metrics

Hardware metrics can be used to identify which resources could be a constraint for performance issues or which need tuning and capacity re-planning.

Monitor with MongoDB Atlas:
- The Atlas metrics tab within a cluster provides plotted graphs for the hardware metrics. These allow you to correlate them with other database metrics. See below for more details:

Metric	Definition	Importance	Signals
Normalized System CPU	The CPU usage of all processes on the node, scaled to a range of 0-100% by dividing by the number of CPU cores.	Helps determine if the correct cluster tier is in use. An improper tier can lead to higher costs if overprovisioned or potential downtime if underprovisioned.	Good: A healthy range is often between 40% and 70%. Bad: Under 40% may indicate potential overprovisioning, while over 70% may indicate potential underprovisioning.
Normalized Process CPU	The percentage of CPU resources utilized by the database process, normalized to account for the number of CPU cores available.	Indicates how efficiently the database is using CPU resources, helping to identify potential performance bottlenecks.	Good: Values around 40-70%. Bad: Values consistently above 80% may indicate CPU contention, while very low values could suggest underutilization.
Disk Latency	Average time taken for read and write operations on the disk, measured in milliseconds.	A measure of disk performance–high latency can significantly impact database performance and user experience.	Good: Low latency values (typically under 5ms). Bad: High latency (over 20ms) can signal disk bottlenecks or issues with the underlying storage infrastructure.
Disk IOPS	Number of input/output operations per second.	Clarifies the disk's ability to support database workloads, especially for read/write-heavy applications.	Good: Low IOPS relative to limits. Bad: High IOPS values relative to limits.
Disk Space Free	Amount of available disk space.	Ensures that there is sufficient space for data growth.	Good: Above 20% of total capacity. Bad: Below 10% capacity can potentially lead to extended downtime if available space is fully depleted.
System Memory	Total amount of RAM being used.	Critical for performance–adequate memory can reduce disk I/O and improve query response times.	Good: Moderate memory usage relative to cluster resources. Bad: High memory usage relative to cluster resources may indicate potential memory pressure, while very low usage could suggest underutilization.

Monitor with General MongoDB instances:
- Use your operating system tools (top, iostat, etc.).

Replication metrics

Replication is a key aspect of MongoDB clusters' availability and durability. The health and performance of replication needs to be carefully monitored in order to maintain a healthy cluster.

Monitor with MongoDB Atlas: The Atlas metrics tab within a cluster provides plotted graphs for the replication metrics. See below for more details:

Metric	Definition	Importance	Signals
Replication Lag	The approximate number of seconds the secondary is behind the primary for write operations.	Indicates how current the secondary nodes are compared to the primary, affecting data consistency and availability.	Good: Low lag, typically under 5 seconds. Bad: High lag (over 10 seconds) can lead to stale reads and potential data loss during failover.
Replication Oplog Window	Duration of write activity actively held in the oplog.	Allows secondaries to catch up with the primary. A sufficiently large oplog window allows for replication and/or resyncs of data.	Good: An oplog window of several hours. Bad: A short window (under 1 hour).
Oplog GB/hour	Amount of data written to the oplog per hour, measured in gigabytes.	Assesses the write load on the primary.	Good: Predictable and expected values based on application activity. Bad: Unpredictable values based on application activity.

Monitor with General MongoDB instances: Use the usage of the following MongoDB Commands:

MongoDB performance monitoring tools

MongoDB provides built-in UI tools in Atlas as well as Cloud Manager and Ops Manager to help you monitor performance. MongoDB also offers some standalone tools and commands to look at more raw data.

Below are some utilities and commands you can run directly against the server via MongoDB Shell.

mongostat

mongostat is used to get a quick overview of the status of your MongoDB server instance. It’s best used for watching a single instance for a specific event as it provides a real-time view. You can use this command to monitor basic server statistics such as operation breakdown, MongoDB memory statistics, lock queues, connections, and the network.

You can execute the MongoDB command through the following syntax:

Code Snippetmongostat <options> <connection-string> <polling interval in seconds>

See example output here.

mongotop

mongotop tracks the amount of time a MongoDB instance spends reading and writing data per collection.

You can execute the MongoDB command through the following syntax:

Code Snippetmongotop <options> <connection-string> <polling interval in seconds>

See example output here.

rs.status()

rs.status() returns the replica set status. It is done from the point of view of the member where the method is run.

See example output here.

db.serverStatus()

db.serverStatus() provides a document representing the current instance metrics counters. Run this command at a regular interval to collect statistics about the instance.

See example output here.

dbStats

dbStats command returns the storage statistics, such as the total collection data versus storage size, number of indexes and their size, and collection-related statistics (number of documents and collections), for a certain database.

See example output here.

You can monitor MongoDB databases by using different tools like mongostat, mongotop, dbStats, and serverStatus commands. These commands provide real-time monitoring and reporting of the database server, allowing you to monitor errors and database performance and assist in informed decision making to optimize a database.

Summary

MongoDB provides a variety of metrics and tools to monitor your database and help ensure it’s running at optimal performance. From UI tools and advisors to raw-data metrics, you're covered whether self-hosting or using MongoDB Atlas.

Get started in learning more about the metrics and tools to monitor your database by earning a Monitoring, Tuning, and Automation Skill Badge.

For more information on monitoring MongoDB databases, see the following resources.

References:

MongoDB Atlas Monitoring
MongoDB Performance
MongoDB Performance Best Practices
MongoDB Professional Services

Appendix

Billing Cost Explorer
Cloud Manager
Cluster View
Compass Performance Tab
Connections
db.serverStatus()
dbStats
Disk IOPS
Disk Latency
Disk Space Free
Metrics Tab
Mongostat
Mongotop
Namespace Insights
Normalized Process CPU
Normalized System CPU
Oplog GB/hour
Opcounters
Operation Execution Time
Ops Manager
Performance Advisor
Queues
Query Profiler
Query Targeting
Real-Time Performance Panel
Replication Lag
Replication Oplog Window
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()
rs.status()
Scan And Order
System Memory

Follow this tutorial with MongoDB Atlas

Experience the benefits of using MongoDB, the premier NoSQL database, on the cloud.

Get Started Free!