Does Atlas Serverless clusters suffer from really slow cold starts?

John_Knoop · May 9, 2022, 6:10pm

We’re considering switching to serverless clusters, and as a step to evaluate the feasibility of this, I’ve created a new serverless cluster and pointed the staging environment of our SaaS product to it, in order to compare load times etc with our production environment (which uses a normal M10 cluster).

One weird thing I’ve noticed is that if I leave a page open and go grab a coffee and come back, the next request times out. If I try a few different endpoints, they all time out, until after half a minute or so, it comes back to life and starts working again.

For some reason I can’t see any traces of this in our telemetry so I haven’t been able to pinpoint the root cause of this behaviour, but once I pointed the staging environment back to our normal M10 cluster, I stopped seeing these timeouts.

So I’m wondering, based on this loose description, if anyone knows whether or not Atlas Serverless clusters might be the cause of this? Like if it gets into sleep mode or something after some idle time, and then is a bit slow to warm up again?

John_Knoop · May 9, 2022, 6:47pm

Update: I was able to find some telemetry for these requests after all.

Apparently there was an exception after 15.7 minutes (!!) with this stack trace:

MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
 ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out.
 ---> System.Net.Sockets.SocketException (110): Connection timed out
   --- End of inner exception stack trace ---
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
   at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](TIOAdapter adapter, Memory`1 buffer)
   at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(CancellationToken cancellationToken)
   --- End of inner exception stack trace ---

Are these timeouts a known bug/feature of Serverless Clusters or might I have misconfigured it somehow?

Vishal_Dhiman · May 11, 2022, 4:34pm

Hi John,
The behavior you are experiencing is unexpected. Serverless database should not behave differently in this the example you posted. I will be reaching out to you directly to further debug the issue.

Sincerely,
Serverless PM team

John_Knoop · May 12, 2022, 5:17pm

Hi @Vishal_Dhiman

Feel free to DM me or e-mail at any time.

John

Karishma_Bothara · March 8, 2023, 4:54am

I have a similar issue.
Error as

ConnectorError(ConnectorError { user_facing_error: None, kind: RawDatabaseError { code: "unknown", message: "Operation timed out (os error 110)" } })

Can someone help?

Trieu_Boo · April 8, 2023, 11:08am

i have same issue with prisma and mongodb the message is Operation timed out (os error 110)
@Karishma_Bothara how do you fix that.

SFM_K_4 · February 3, 2024, 11:53pm

Facing the same issue. any solution?

Anurag_Kadasne · February 23, 2024, 1:33pm

Hi

We made some changes on our end last year and this issue should no longer exist. Let me DM you to get more information as I will need details about your cluster. Thanks