About maxTimeMS() and interrupt points

MatteoSp · May 4, 2023, 9:12am

I’m experiencing this behaviour: I have a very complex (therefore very slow) query which does not honours the specified maxTimeMS.

I found this in the docs:

MongoDB only terminates an operation at one of its designated interrupt points.

But there’s no clarification of what an “interrupt points” really is. I’ve read elsewhere that the number of batches may be related, but did not find anything describing what happen in case of single batch (my case, as my query only returns a bunch of docs).

My suspect: a query comprised of only one batch would never be terminated no matter the value of maxTimeMS nor the time required to complete the batch.

Can anyone confirm this?

thanks

Kushagra_Kesav · May 4, 2023, 1:59pm

Hey @MatteoSp,

Welcome to the MongoDB Community forums

Could you please share the logline for this query, which took longer than maxTimeMS, showing its duration, and query plan?

As per the documentation, interrupt point is a point in an operation’s lifecycle when it can safely abort.

Best,
Kushagra

MatteoSp · May 4, 2023, 3:17pm

Hi Kushagra!

Could you please share the logline for this query, which took longer than maxTimeMS , showing its duration, and query plan?

No, I cannot.

As per the documentation, interrupt point is a point in an operation’s lifecycle when it can safely abort.

This precisely what I mean by there’s no clarification. Where are these points located within an operation lifecycle? After every document? After a numbers of docs? After every batch? Elsewhere?

thanks

Kushagra_Kesav · May 5, 2023, 4:42am

Hey @MatteoSp,

The interrupt points are implementation details and may change from version to version, and as far as I know there is not a single exhaustive list that shows all of them (since they’re implementation details). However, in general terms, MongoDB query operations have a ‘yield point’, where it can pause and give control to other operations, typically while waiting for data to load from disk.

Regarding the ‘batch’ parameter in MongoDB, I believe you meant batchSize(). This determines the number of documents returned in each batch of a response, which I don’t feel is the main cause of what you’re seeing unless you have evidence otherwise.

To further assist you, may I ask if you are using the latest version of MongoDB? If not, I kindly suggest updating it. Additionally, it would be helpful to know if you are consistently reproducing the issue and provide a script for reference. Without a reproduction script, it’s quite impossible to determine what’s happening in your specific case.

Best,
Kushagra

MatteoSp · May 5, 2023, 7:32am

Hi Kushagra,
here a piece of the log (taken from Atlas advisor, I removed some sensitive detail):

As you can see I have a maxTimeMS of 55K ms, but the find operation completes in 300K ms.

About batch/batchSize: I was asking if being in the case of a single batch (nReturned = 2) prevents any interrupt point to be reached (because of this: Clarification regarding working of maxTimeMS options - #3 by Jason_Tran).

thanks

Kushagra_Kesav · May 30, 2023, 6:15am

Hey @MatteoSp,

Apologies for the late response.

Understanding maxTimeMS:

The maxTimeMS parameter serves to limit resource consumption and prevent operations from running indefinitely. However, it is important to note that maxTimeMS is not a hard limit and does not guarantee operations will stop precisely at the specified time. Instead, it represents the “cumulative processing time”, excluding yield time.

Analysis of your scenario:

Based on the provided information, the query yielded 23,357 times. Each yield represents an interrupt point where the query takes a break. So it’s possible to blow past maxTimeMS in wall clock time, and the query will not be stopped since it hasn’t cumulatively spent maxTimeMS in processing time. It’s possible that the server is very busy, leading the query to yield a lot, and it may seem like the maxTimeMS was ignored. However, cumulatively, the query hasn’t exceeded the maxTimeMS setting in processing time since most of the time was spent on waiting and yielding.

Regarding the question about a single batch with nReturned = 2, it does not directly prevent interrupt points from being reached. Interrupt points occur during query execution regardless of the batch size. However, it’s important to consider the overall execution time and resource consumption. If the query involves multiple batches or has a higher nReturned value, it may result in a longer processing time also.

I hope it provides you with an understanding of your scenario. In case of any further questions or concerns feel free to reach out.

Best regards,
Kushagra