updateOne timing out with lambda hitting serverless

I have a set of lambda functions created with sam cli that is our app that connects to a serverless instance hosted on atlas that appears to function great except for this one issue that just popped up.

On a particular lambda sometimes when trying to run an updateOne command the lambda times out after its timeout period expires. We’re trying to only update one row with a small object that has at most a dozen parameters, so it doesn’t make sense we should be running into resource limits. For testing we’ve tried increasing lambda timeout to multiple minutes, and double memory but it still times out in the same spot.

Oddly the lambda functions generally fine and that updateOne command works with other rows, but for some reason a set of data in the collection can’t be updated and seems to just hang until the lambda times out.

I’ve definitely explored a lot of options already, including following the atlas docs on lambda best practices, which includes calling new MongoClient outside your lambda and trying to re-use the client, as well as attempting to lower connectTimeoutMS and socketTimeoutMS to see if I could at least see some kind of error.

I of course have all of my db code and commands wrapped around try/catches, and it works to catch actual errors that are thrown, but in this case no error is thrown its just seemingly hanging with nothing going on.

Attempting to chat to the atlas support bot helpfully tells me that logs aren’t available to all but m10 and above. I have no insights into the problem, no available logs to be able to monitor; if I was hosting this db on a physical box I could control I could at least ssh into the box and monitor syslogs or app logs to try to investigate further but in this case I have no further insight into what could be going on.

What are my options going forwarding? If I move from a serverless to an m10 would I at least be able to get some logs?

Are there any options I can pass to try to either surface logging myself or at least enable some kind of debug mode to be able to figure out if this is a locking/pool/connection issue?

Oh I wanted to also mention we are using aws iam auth, and have temporarily allowed all ip’s to hit the instance so I don’t believe it to be any kind of vpc peering issue. For the majority of other crud operations using the site works fine with inserts, updates, and selects firing off as they should.

Hi Galen

Yes, you will have access to logs if you moved to an M10. What is the error message that you receive when the lambda tries to run the updateOne?