Timeout on insert Mongo Atlas with Azure Serverless backend

We are experiencing strange timeouts when inserting a document into MongoDB Atlas with a Azure Serverless backend.

Our software is build in .net7 and we are using version 2.19 of the MongoDB.Driver.

dbug: MongoDB.Connection[0]
      1 1 p9cd6-3m-acceptance-lb.izjha.mongodb.net 27017 Connection failed MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
       ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out.
       ---> System.Net.Sockets.SocketException (110): Connection timed out
         --- End of inner exception stack trace ---
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
         at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
      --- End of stack trace from previous location ---
         at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
dbug: MongoDB.Command[0]
      1 1 p9cd6-3m-acceptance-lb.izjha.mongodb.net 27017 35647 5 2 Command failed insert 953047.9342 MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
       ---> System.IO.IOException: Unable to read data from the transport connection: Connection timed out.
       ---> System.Net.Sockets.SocketException (110): Connection timed out
         --- End of inner exception stack trace ---
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource<System.Int32>.GetResult(Int16 token)
         at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory`1 buffer, CancellationToken cancellationTo... 640855d4039f92600d14c171

Sometimes the application takes very long to insert the document, through the debug log from the driver we see the above messages.

The insert is completed however, but it took 15+ minutes to do so.

I created an example project on Github that demonstrates the issue, there you can find more details including logs.

We are looking forward to your reply.

Hi @Jeffrey_Tummers and welcome to the MongoDB community forum!!

In the Production Notes for Azure, it was recommended to set the TCP timeout setting to 120 sec, down from its default 240 sec. Is it possible for you to try using this value and see if the issue persists?

If the above problem persists, could you help me with a few details of the deployment.

  1. How frequently are you seeing the timeout error in the system.
  2. Does this happen for a specific type of insert in the the Atlas database?
  3. Where are the Atlas and the Azure application regions located?
  4. While seeing the error in the application, can you connect to Atlas outside of the application.
  5. Are you trying to do a specific kind of bulk inserts into the database?

Best Regards
Aasawari

Hi @Aasawari thank you for your reply :slight_smile:
And sorry for my late reply, but we switched over to using Google backend and everything has been running smoothly since!

I didn’t use the 120 sec timeout, but I’ll run the same test and report back after.

  1. How frequently are you seeing the timeout error in the system.
    • It’s hard to pinpoint the exact frequency, sometimes multiple times an hour sometimes it takes a few hours to start. You can see more details in the logs
  2. Does this happen for a specific type of insert in the the Atlas database?
    • It is just a simple document inserted with the InsertOneAsync method
  3. Where are the Atlas and the Azure application regions located?
    • Region is Azure / Netherlands (westeurope)
  4. While seeing the error in the application, can you connect to Atlas outside of the application.
    • Have not tried that
  5. Are you trying to do a specific kind of bulk inserts into the database?
    • No, simple insert action, see the example code on github.

Hi @Aasawari, reporting back with an update around the 120sec TCP timeout.

I can confirm no timeouts happening when I configured the TCP timeout to 120 (default 7200 on my Linux machine)

# reading the current TCP timeout setting
cat /proc/sys/net/ipv4/tcp_keepalive_time

# setting it (resets on my machine after reboot)
sudo sysctl -w net.ipv4.tcp_keepalive_time=120

Why is this setting not needed for the Azure statefull server instances?

For us configuring the tcp timeout on each pod/server is not something we desire, so we are sticking to the Google serverless instances, which don’t require this setting.