MongoDB Performance Tuning Questions
Rate this article
Most of the challenges related to keeping a MongoDB cluster running at top speed can be addressed by asking a small number of fundamental questions and then using a few crucial metrics to answer them.
By keeping an eye on the metrics related to query performance, database performance, throughput, resource utilization, resource saturation, and other critical "assertion" errors it's possible to find problems that may be lurking in your cluster. Early detection allows you to stay ahead of the game, resolving issues before they affect performance.
Each type of MongoDB deployment can be used to support databases at scale with immense transaction volumes and that means performance tuning should be a constant activity.
But the good news is that the same metrics are used in the tuning process no matter how MongoDB is used.
Here are the key questions you should always be asking about MongoDB performance tuning and the metrics that can answer them.
Query problems are perhaps the lowest hanging fruit when it comes to debugging MongoDB performance issues. Finding problems and fixing them is generally straightforward. This section covers the metrics that can reveal query performance problems and what to do if you find slow queries.
- But remember: indexes have a cost when it comes to writes and updates. Too many indexes that are underutilized can slow down the modification or insertion of new documents. Depending on the nature of your workloads, this may or may not be a problem.
- In the absence of indexes, a rarely met ideal for this ratio is 1/1, meaning all documents scanned were returned — no wasted scans. Most of the time however, when scanning is done, documents are scanned that are not returned meaning the ratio is greater than 1.
- A high Scan and Order number, say 20 or more, indicates that the server is having to sort query results to get them in the right order. This takes time and increases the memory load on the server.
- Fix this by making sure indexes are sorted in the order in which the queries need the documents, or by adding missing indexes.
- WiredTiger has a concept of read or write tickets that are created when the database is accessed. The WiredTiger ticket number should always be at 128.
- If the value goes below 128 and stays below that number, that means the server is waiting on something and it's an indication of a problem.
- The remedy is then to find the operations that are going too slowly and start a debugging process.
- Deployments of MongoDB using releases older than 3.2 will certainly get a performance boost from migrating to a later version that uses WiredTiger.
Document Structure Antipatterns aren't revealed by a metric but can be something to look for when debugging slow queries. Here are two of the most notorious bad practices that hurt performance.
Subdocuments without bounds: The same thing can happen with respect to subdocuments. MongoDB supports inserting documents within documents, with up to 128 levels of nesting. Each MongoDB document, including subdocuments, also has a size limit of 16MB. If the number of subdocuments becomes excessive, performance problems may result.
MongoDB, like most advanced database systems, has thousands of metrics that track all aspects of database performance which includes reading, writing, and querying the database, as well as making sure background maintenance tasks like backups don't gum up the works.
The metrics described in this section all indicate larger problems that can have a variety of causes. Like a warning light on a dashboard, these metrics are invaluable high-level indicators that help you start looking for the causes before the database has a catastrophic failure.
Replication lag occurs when a secondary member of a replica set falls behind the primary. A detailed examination of the OpLog related metrics can help get to the bottom of the problems but the causes are often:
- A networking issue between the primary and secondary, making nodes unreachable
- A secondary node applying data slower than the primary node
- Insufficient write capacity in which case you should add more shards
- Slow operations on the primary node, blocking replication
Locking performance problems are indicated when the number of available read or write tickets remaining reaches zero, which means new read or write requests will be queued until a new read or write ticket is available.
- MongoDB's internal locking system is used to support simultaneous queries while avoiding write conflicts and inconsistent reads.
- Locking performance problems can indicate a variety of problems including suboptimal indexes and poor schema design patterns, both of which can lead to locks being held longer than necessary.
Number of open cursors rising without a corresponding growth of traffic is often symptomatic of poorly indexed queries or the result of long running queries due to large result sets.
- This metric can be another indicator that the kind of query optimization techniques mentioned in the first section are in order.
A large part of performance tuning is recognizing when your total traffic, the throughput of transactions through the system, is rising beyond the planned capacity of your cluster. By keeping track of growth in throughput, it's possible to expand the capacity in an orderly manner. Here are the metrics to keep track of.
Read and Write Operations is the fundamental metric that indicates how much work is done by the cluster. The ratio of reads to writes is highly dependent on the nature of the workloads running on the cluster.
- Monitoring read and write operations over time allows normal ranges and thresholds to be established.
- As trends in read and write operations show growth in throughput, capacity should be gradually increased.
Document Metrics and Query Executor are good indications of whether the cluster is actually too busy. These metrics can be found in Cloud Manager and in . As with read and write operations, there is no right or wrong number for these metrics, but having a good idea of what's normal helps you discern whether poor performance is coming from large workload size or attributable to other reasons.
- Document metrics are updated anytime you return a document or insert a document. The more documents being returned, inserted, updated or deleted, the busier your cluster is.
- Poor performance in a cluster that has plenty of capacity usually points to query problems.
- The query executor tells how many queries are being processed through two data points:
- Scanned - The average rate per second over the selected sample period of index items scanned during queries and query-plan evaluation.
- Scanned objects - The average rate per second over the selected sample period of documents scanned during queries and query-plan evaluation.
Hardware and Network metrics can be important indications that throughput is rising and will exceed the capacity of computing infrastructure. These metrics are gathered from the operating system and networking infrastructure. To make these metrics useful for diagnostic purposes, you must have a sense of what is normal.
- There's a lot to track but at a minimum have a baseline range for metrics like:
- Disk latency
- Disk IOPS
- Number of Connections
A MongoDB cluster makes use of a variety of resources that are provided by the underlying computing and networking infrastructure. These can be monitored from within MongoDB as well as from outside of MongoDB at the level of computing infrastructure as described in the previous section. Here are the crucial resources that can be easily tracked from within Mongo, especially through Cloud Manager and .
Current number of client connections is usually an effective metric to indicate total load on a system. Keeping track of normal ranges at various times of the day or week can help quickly identify spikes in traffic.
- A related metric, percentage of connections used, can indicate when MongoDB is getting close to running out of available connections.
Storage metrics track how MongoDB is using persistent storage. In the WiredTiger storage engine, each collection is a file and so is each index. When a document in a collection is updated, the entire document is re-written.
- If memory space metrics (dataSize, indexSize, or storageSize) or the number of objects show a significant unexpected change while the database traffic stays within ordinary ranges, it can indicate a problem.
- A sudden drop in dataSize may indicate a large amount of data deletion, which should be quickly investigated if it was not expected.
Memory metrics show how MongoDB is using the virtual memory of the computing infrastructure that is hosting the cluster.
- An increasing number of page faults or a growing amount of dirty data — data changed but not yet written to disk — can indicate problems related to the amount of memory available to the cluster.
- Cache metrics can help determine if the working set is outgrowing the available cache.
- Monitoring the number of asserts created at various levels of severity can provide a first level indication of unexpected problems. Asserts can be message asserts, the most serious kind, or warning assets, regular asserts, and user asserts.
- Examining the asserts can provide clues that may lead to the discovery of problems.
Making use of metrics is far easier if you know the data well: where it comes from, how to get at it, and what it means.
As the MongoDB platform has evolved, it has become far easier to monitor clusters and resolve common problems. In addition, the performance tuning monitoring and analysis has become increasingly automated. For example, through Performance Advisor will now suggest adding indexes if it detects a query performance problem.
But it's best to know the whole story of the data, not just the pretty graphs produced at the end.
The sources for metrics used to monitor MongoDB are the logs created when MongoDB is running and the commands that can be run inside of the MongoDB system. These commands produce the detailed statistics that describe the state of the system.
This information is of high quality but difficult to use.
As MongoDB has matured as a platform, specialized interfaces have been created to bring together the most useful metrics.
has taken advantage of the standardized APIs and massive amounts of data available on cloud platforms to break new ground in automating performance tuning. Also, in addition to the mentioned above, the for analyzes queries that you are actually making on your data, determines what's slow and what's not, and makes recommendations for when to add indexes that take into account the indexes already in use.
In a sense, the questions covered in this article represent a playbook for running a performance tuning process. If you're already running such a process, perhaps some new ideas have occurred to you based on the analysis.
Resources like this article can help you achieve or refine your goals if you know the questions to ask and some methods to get there. But if you don't know the questions to ask or the best steps to take, it's wise to avoid trial and error and ask someone with experience. With broad expertise in tuning large MongoDB deployments, can help identify the most effective steps to take to improve performance right away.
Once any immediate issues are resolved, professional services can guide you in creating an ongoing streamlined performance tuning process to keep an eye on and action the metrics important to your deployment.
We hope this article has made it clear that with a modest amount of effort, it's possible to keep your MongoDB cluster in top shape. No matter what types of workloads are running or where the deployment is located, use the ideas and tools mentioned above to know what's happening in your cluster and address performance problems before they become noticeable or cause major outages.