Welcome to the second in our series of blog posts covering performance best practices for MongoDB. In this series, we are covering key considerations for achieving performance at scale across a number of important dimensions, including:
- Data modeling and sizing memory (the working set)
- Query patterns and profiling, which we’ll cover today
- Transactions and read/write concerns
- Hardware and OS configuration
Designing the right query patterns and profiling query behavior is essential to the smooth running of your application. Profiling will also help you select the most appropriate indexes. We will cover indexing in the next post in this series.
Start by using the latest drivers
MongoDB’s drivers are engineered by the same team that develops the core database. Drivers are updated more frequently than the database, typically every several months. Always use the most recent version of the drivers when possible, and install native extensions if available for your language. Develop a standard procedure for testing and upgrading drivers so that upgrading is naturally a part of your process.
Avoid creating large, unbounded documents
As noted in the data modeling section in the first part of this blog series, the maximum size for documents in MongoDB is 16 MB. In practice, most documents are kilobytes or less.
You should avoid certain application patterns that would allow documents to grow unbounded. For example, in an e-commerce application it would be difficult to estimate how many customer reviews each product might receive. It is also typical that only a subset of reviews is displayed to a customer, such as the most popular or the most recent reviews.
Rather than modeling the product and all its reviews as a single document, it would be better to store the subset of reviews in the product document itself for the fastest access. Other, less relevant reviews can be stored in a separate document with a reference or $lookup to the product document. Our previous blog post in this series provides deeper resources into data modeling best practices by use cases.
Issue updates to only modify fields that have changed
Rather than retrieving the entire document in your application, updating fields, then saving the document back to the database, instead issue an update on specific fields. This has the advantage of less network usage and reduced database overhead.
Update multiple array elements in a single operation
With fully expressive array updates, you can perform complex array manipulations against matching elements of an array – including elements embedded in nested arrays – all in a single update operation. Using the arrayFilters option, the update can specify which elements to modify in the array field.
Use MongoDB Atlas Analytics nodes
If your application performs complex or long-running operations, such as reporting or ETL, you may want to isolate analytics queries from the rest of your operational workload. By isolating different workloads, you can ensure different query types never contend for system resources, and avoid analytics queries flushing the working set from RAM.
If you are running MongoDB on your own infrastructure, then you can configure replica set tags to achieve the same read isolation as Atlas Analytics nodes.
Profile queries with the explain plan
MongoDB’s explain() method enables you to test queries from your application, showing information about how a query will be, or was, resolved, including:
- Which indexes were used
- Whether the query was covered by the index or not
- Whether an in-memory sort was performed, which indicates an index would be beneficial
- The number of index entries scanned
- The number of documents returned, and the number read
- How long the query took to resolve in milliseconds
- Which alternative query plans were rejected (when using the
The explain plan will show 0 milliseconds if the query was resolved in less than 1 ms, which is typical for well-tuned queries.
The MongoDB Compass GUI visualizes explain output, making it even easier for you identify and resolve performance issues.
You can display explain plans via a tree, or review the full raw JSON output. The documentation has more detail on Compass visual explain plans.
Use the MongoDB Query Profiler
The MongoDB Query Profiler helps expose performance issues by displaying slow-running queries (by default, queries that exceed 100ms) and their key performance statistics directly in the Atlas UI.
A chart provides a high-level view of that information that makes it easy to quickly identify outliers and general trends, while a table offers operation statistics by namespace (database and collection) and operation type. You can choose which metric to filter and list operations. This includes operation execution time, documents scanned to returned ratio, whether an index was used, whether an in-memory sort occurred, and more. You can select a specific time frame for the operations displayed, from the past 15 minutes to the past 24 hours.
Once you have identified which operations are potentially problematic, the Query Profiler allows you to dig deeper into operation-level statistics to gain more insight into what’s happening. You can view granular information on a specific operation in the context of similar operations, which can help you identify what general optimizations need to be made to improve performance. The Atlas Query Profiler is available without additional cost or performance overhead.
If you are running MongoDB on-premises, Ops Manager – part of MongoDB Enterprise Advanced – also includes a query profiler.
Other Tools and Utilities
The MongoDB Database Profiler collects detailed information about operations and commands executed against a running mongod instance. All data collected by the profiler is written to the system.profile collection, a capped collection in the admin database which you can query yourself for insights, and you configure logging levels based on the granularity of data you want to analyze.
mtools is a collection of helper scripts to parse, filter, and visualize MongoDB log files. mloginfo will analyze queries against each collection and group common query patterns to help you identify which queries consumed most resources in aggregate. We will dive into indexes in the next post.
Review the Monitoring for MongoDB documentation for a complete overview of the utilities and 3rd party tools you can also use.
That wraps up this latest installment of the performance best practices series. MongoDB University offers a no-cost, web-based training course on MongoDB performance. This is a great way to learn more about optimizing query patterns.
Next up in this series: indexing.