maxTimeMS() and Query Optimizer Introspection in MongoDB 2.6

MongoDB

#Releases

By Alex Komyagin, Technical Service Engineer in MongoDB’s New York Office

In this series of posts we cover some of the enhancements in MongoDB 2.6 and how they will be valuable for users. You can find a more comprehensive list of changes in our 2.6 release notes. These are changes I believe you will find helpful, especially for our advanced users.

This post introduces two new features: maxTimeMS and query optimizer introspection. We will also look at a specific support case where these features would have been helpful.

maxTimeMS: SERVER-2212

The socket timeout option (e.g. MongoOptions.socketTimeout in the Java driver) specifies how long to wait for responses from the server. This timeout governs all types of requests (queries, writes, commands, authentication, etc.). The default value is “no timeout” but users tend to set the socket timeout to a relatively small value with the intention of limiting how long the database will service a single request. It is thus surprising to many users that while a socket timeout will close the connection, it does not affect the underlying database operation. The server will continue to process the request and consume resources even after the connection has closed.

Many applications are configured to retry queries when the driver reports a failed connection. If a query whose connection is killed by a socket timeout continues to consume resources on the server, a retry can give rise to a cascading effect. As queries run in MongoDB, the application may retry the same resource intensive query multiple times, increasing the load on the system leading to severe performance degradation. Before 2.6 the only way to cancel these queries was to issue db.killOp, which requires manual intervention.

MongoDB 2.6 introduces a new queryparameter, maxTimeMS, which limits how long an operation can run in the database. When the time limit is reached, MongoDB automatically cancels the query. maxTimeMS can be used to efficiently prevent unexpectedly slow queries from overloading MongoDB. Following the same logic, maxTimeMS can also be used to identify slow or unoptimized queries.

There are a few important notes about using maxTimeMS:

  • maxTimeMS can be set on a per-operation basis – long running aggregation jobs can have a different setting than simple CRUD operations.
  • requests can block behind locking operations on the server, and that blocking time is not counted, i.e. the timer starts ticking only after the actual start of the query when it initially acquires the appropriate lock;
  • operations are interrupted only at interrupt points where an operation can be safely aborted – the total execution time may exceed the specified maxTimeMS value;
  • maxTimeMS can be applied to both CRUD operations and commands, but not all commands are interruptible;
  • if a query opens a cursor, subsequent getmore requests will be included in the total time (but network roundtrips will not be counted);
  • maxTimeMS does not override the inactive cursor timeout for idle cursors (default is 10 min).

All MongoDB drivers support maxTimeMS in their releases for MongoDB 2.6. For example, the Java driver will start supporting it in 2.12.0 and 3.0.0, and Python - in version 2.7.

To illustrate how maxTimeMS can come in handy, let’s talk about a recent case that the MongoDB support team was working on. The case involved performance degradation of a MongoDB cluster that is powering a popular site with millions of users and heavy traffic. The degradation was caused by a huge amount of read queries performing full scans of the collection. The collection in question stores user comment data, so it has billions of records. Some of these collection scans were running for 15 minutes or so, and they were slowing down the whole system.

Normally, these queries would use an index, but in this case the query optimizer was choosing an unindexed plan. While the solution to this kind of problem is to hint the appropriate index, maxTimeMS can be used to prevent unintentional runaway queries by controlling the maximum execution time. Queries that exceed the maxTimeMS threshold will error out on the application side (in the Java driver it’s the MongoExecutionTimeoutException exception). maxTimeMS will help users to prevent unexpected performance degradation, and to gain better visibility into the operation of their systems.

In the next section we’ll take a look at another feature which would help in diagnosing the case we just discussed.

Query Optimizer Introspection: SERVER-666

To troubleshoot the support case described above and to understand why the query optimizer was not picking up the correct index for queries, it would have been very helpful to know a few things, such as whether the query plan had changed and to see the query plan cache.

MongoDB determines the best execution plan for a query and stores this plan in a cache for reuse. The cached plan is refreshed periodically and under certain operational conditions (which are discussed in detail below). Prior to 2.6 this caching behavior was opaque to the client. Version 2.6 provides a new query subsystem and query plan cache. Now users have visibility and control over the plan cache using a set of new methods for query plan introspection.

Each collection contains at most one plan cache. The cache is cleared every time a change is made to the indexes for the collection. To determine the best plan, multiple plans are executed in parallel and the winner is selected based on the amount of retrieved results within a fixed amount of steps, where each step is basically one “advance” of the query cursor. When a query has successfully passed the planning process, it is added to the cache along with the related index information.

A very interesting feature of the new query execution framework is the plan cache feedback mechanism. MongoDB stores runtime statistics for each cached plan in order to remove plans that are determined to be the cause of performance degradation. In practice, we don’t see these degradations often, but if they happen it is usually a consequence of a change in the composition of the data. For example, with new records being inserted, an indexed field may become less selective, leading to slower index performance. These degradations are extremely hard to manually diagnose, and we expect the feedback mechanism to automatically handle this change if a better alternative index is present.

The following events will result in a cached plan removal:

  • Performance degradation
  • Index add/drop
  • No more space in cache (the total number of plans stored in the collection cache is currently limited to 200; this is a subject to change in future releases)
  • Explicit commands to mutate cache
  • After the number of write operations on a collection exceeds the built-in limit, the whole collection plan cache is dropped (data distribution has changed)

MongoDB 2.6 supports a set of commands to view and manipulate the cache. List all known query shapes (planCacheListQueryShapes), display cached plans for a query shape (planCacheListPlans), manual removal of a query shape from the cache as well as emptying the whole cache (planCacheClear).

Here is an example invocation of the planCacheListQueryShapes command, that lists the shape of the query

db.test.find({first_name:"john",last_name:"galt"},

{_id:0,first_name:1,last_name:1}).sort({age:1}):

> db.runCommand({planCacheListQueryShapes: "test"})
{
    "shapes" : [
        {
            "query" : {
                "first_name" : "alex",
                "last_name" : "komyagin"
            },
            "sort" : {
                "age" : 1
            },
            "projection" : {
                "_id" : 0,
                "first_name" : 1,
                "last_name" : 1
            }
        }
    ],
    "ok" : 1
}

The exact values in the query predicate are insignificant in determining the query shape. For example, a query predicate

{first_name:"john", last_name:"galt"} is equivalent to the query predicate {first_name:"alex", last_name:"komyagin"}.

Additionally, with log level set to 1 or greater MongoDB will log plan cache changes caused by the events listed above. You can set the log level using the following command:

use admin
db.runCommand( { setParameter: 1, logLevel: 1 } )

Please don’t forget to change it back to 0 afterwards, since log level 1 logs all query operations and it can negatively affect the performance of your system.

Together, the new plan cache and the new query optimizer should make related operations more transparent and help users to have the visibility and control necessary for maintaining predictable response times for their queries.

You can see how these new features work yourself, by trying our latest release MongoDB 2.6 available for download here. I hope you will find these features helpful. We look forward to hearing your feedback, please post your questions in the mongodb-user google group.

Like what you see? Get MongoDB updates straight to your inbox