How to Monitor MongoDB

MongoDB provides monitoring commands and tools to enhance database performance and check the health of your database instances. This document will help you understand various metrics that you can measure to monitor your MongoDB Atlas clusters. This document will also focus on general MongoDB tools you can use to monitor your self-managed MongoDB installations.

Why Monitor MongoDB?

A key aspect of database administration and capacity planning for your application is monitoring your cluster's health and performance. Although MongoDB Atlas, which is a Database-as-a-Service cross-cloud platform, handles a vast majority of administration efforts and has built-in fault tolerance/scaling ability, it is crucial that users know how to best monitor their clusters and tweak or scale whatever they need before hitting a crisis.

Monitoring MongoDB databases allows you to:

  • Understand the current capacity of your database.
  • Observe the utilization of resources.
  • Observe the presence of abnormal behavior and performance issues.
  • Detect and react to real-time issues to improve your application stack.
  • Comply with your SLA and data protection/governance requirements.

How to Monitor MongoDB

There are four main areas which we should keep in mind when monitoring MongoDB:

  • Instance status and health
  • MongoDB cluster’s operations and connections metrics
  • Instance hardware metrics
  • Replication metrics

Each area can be covered by a few utilities and commands:

Process Status and Health

The status of a MongoDB server process can be an immediate indication of whether we need to drill down into its activity or health. A process that is unresponsive or does not answer to our commands should be immediately investigated.

  • How to monitor with MongoDB Atlas: Cluster health and process health can be seen via the Cluster view. Green dots means a healthy state, while orange and red mean there are issues with the process.
  • How to monitor self-managed MongoDB instances: Commands such as rs.status() for replica sets and sh.status() for sharded clusters provide a high level status of the cluster. Use MongoDB’s built-in free monitoring feature to get information on Operation Execution Times, Memory Usage, CPU Usage, and Operation Counts.

MongoDB Cluster’s Operations and Connection Metrics

When our application is struggling or underperforming, we need to rule out the database layer as the bottleneck. Since the application issues connections and operations against the database, we should pay close attention to their behavior.

MongoDB provides various metrics and mechanisms to identify its connections and operations patterns. On top of the active and proactive monitoring tools, Atlas provides a full alerting system and log gathering is available.

Instance Hardware Metrics

The hardware metrics are important to track. They can be used to identify which resources could be the root cause for performance issues or need tuning and capacity re-planning.

Replication Metrics

Replication is a key aspect of MongoDB clusters' high availability and durability. The health and performance of replication needs to be carefully monitored in order to maintain a healthy cluster.

What MongoDB Metrics to Monitor

While monitoring MongoDB metrics, you should look out for the following.

MongoDB Cluster’s Operations and Connection Metrics

Let’s cover the main metrics for operations and connection monitoring.

Opcounters

The average rate of operations performed per second over the selected sample period. Opcounters graph/metric shows the operations velocity and breakdown of operation types for the instance.

Operation Execution Times

This is the average operation time (read and write operations) performed over the selected sample period.

Query Executors and Query Targeting

Query Executors represent the average rate per second over the selected sample period of scanned documents during queries and query-plan evaluation. The query targeting represents the ratio between the number of documents scanned and the number of documents returned. A high number ratio may indicate suboptimal operations which scan a lot of documents to return a smaller portion.

Connections

This describes the number of open connections to the instance. High numbers or spikes might indicate a suboptimal connection strategy from the client side or unresponsive server.

Queues

Queues describe the number of operations waiting for a lock, either read or write. High queues may indicate the existence of conflicting writing paths or suboptimal schema design, which force high competition over database resources.

Scan and Order

This refers to the average rate per second over the selected sample period of queries that return sorted results that cannot perform the sort operation using an index.

MongoDB Hardware Mertics

Let’s cover the main metrics for hardware monitoring.

Normalized System CPU

This is the percentage of time the CPU spent on system calls servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores. It covers CPU used by modules such as user, kernel, iowait, steal, etc. High kernel or user CPU might indicate an exhaustion of CPU by the MongoDB operations (software) while high iowait will most likely be related to storage exhaustion being the root cause for CPU exhaustion.

Normalized Process CPU

This is the percentage of time the CPU spent on application software (MongoDB code) servicing this MongoDB process, scaled to a range of 0-100% by dividing by the number of CPU cores. It covers CPU used by modules such as user, kernel. High kernel CPU might indicate an exhaustion of CPU by the Operating System operations while high user will most likely be related to MongoDB operations being the root cause for CPU exhaustion.

Disk Latency

Disk latency is the read and write disk latency in milliseconds of the disk partition used by MongoDB. High values (>500ms) mean MongoDB might be impacted by the storage layer.

Disk IOPS

This is the average consumed IO operations per second on the disk partition used for MongoDB.

Disk Space Free

This refers to the total bytes of free disk space on the disk partition used by MongoDB. Atlas offers a disk auto scaling capabilities which are based on this metric.

System Memory

The system memory describes the used number of physical memory bytes in use versus the number of free unused available space. The available metric estimates the number of bytes of system memory available for running new applications, without swapping.

Swap Usage

A Swap Usage graph describes how much memory is being placed on the swap device. A high used metric in this graph indicates that swap is being used which directly indicates that the memory is under-provisioned for the current workload.

MongoDB Replication Metrics

Let’s cover the main metrics for replication monitoring.

Replication Lag

Replication lag is the approximate number of seconds a secondary node is behind the primary in write operations. High replication lag will indicate a secondary that struggles to replicate and might influence your operations' latency considering the write/read concern of the connections.

Replication Oplog Window

This is the approximate number of hours available in the primary's replication oplog. If a secondary is lagging more than this amount, it cannot catch up and will require a full resync.

Replication Headroom

Replication headroom is the difference between the primary's replication oplog window and the secondary's replication lag. A secondary can go into RECOVERING if this value goes to zero.

Oplog GB/Hour

This refers to the average rate of gigabytes of oplog the primary generates per hour. High unexpected volumes of oplog might indicate a schema design issue or highly insufficient write workload.

Opcounters - repl

This refers to the average rate of replication operations performed per second over the selected sample period. Opcounters - repl graph/metric shows the operations velocity and breakdown of operation types for the instance.

MongoDB Performance Monitoring Tools

MongoDB provides built-in UI tools in Atlas as well as Cloud Manager and Ops Manager to help you monitor performance. MongoDB also offers some standalone tools and commands to look at more raw-based data.

Let’s understand the tools we have. These are the tools you can run from a host which has access and appropriate roles (clusterMonitor) to monitor your environment.

mongostat Command

mongostat is used to get a quick overview of the status of your MongoDB server instance. It’s best used for watching a single instance for a specific event as it provides a real-time view. You can use this command to monitor basic server statistics such as operation breakdown, MongoDB memory statistics, lock queues, and connections/network.

You can execute the MongoDB command through the following syntax:

mongostat <options> <connection-string> <polling interval in seconds>

See example output here.

mongotop Command

mongotop tracks the amount of time a MongoDB instance spends reading and writing data per collection.

You can execute the MongoDB command through the following syntax:

mongotop <options> <connection-string> <polling interval in seconds>

See example output here.

rs.status() Command

rs.status() returns the replica set status. It is done from the point of view of the member where the method is run.

See example output here.

db.serverStatus() Command

When you want to have an overview of the database’s state, you use the db.serverStatus() command. It provides you with a document representing the current instance metrics counters. Run this command at a regular interval to collect statistics about the instance.

See example output here.

dbStats Command

The dbStats command returns the storage statistics, such as the total collection data versus storage size, number of indexes and their size, and collection-related statistics (number of documents and collections), for a certain database.

See example output here.

collStats Command

The collStats command is used to collect statistics similar to that provided by dbStats on the collection level. Its output includes a count of the objects in the collection, the collection’s size, the amount of disk space consumed by the collection, and information concerning its indexes for a given collection.

See example output here.

Hence, we can monitor MongoDB databases by using different tools like mongostat, mongotop, dbStats, collStats, and serverStatus commands. These commands provide real-time monitoring and reporting of the database server that allows us to monitor errors and database performance and assist in informed decision making to optimize a database.

Summary

MongoDB provides a variety of metrics and tools to monitor your database and ensure it's running at optimal performance. From UI tools to advisors to raw-sata metrics, you're covered whether you're hosting your database yourself or using MongoDB Atlas.

For more information on monitoring MongoDB databases, see the following resources.

References:

MongoDB Atlas Monitoring

MongoDB Performance

MongoDB Performance Best Practices