Timeout when selecting a server after inactivity

Hi,

We are using mongodb atlas using a mongodb+srv connection. When our webapp is starting up and handling requests - everything is working just fine. When our webapp gets inactive/idle a few hours we a getting timeouts on new requests:

System.TimeoutException: A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [] }.    
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)    
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)    
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedAsync(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)    
at MongoDB.Driver.Core.Clusters.Cluster.SelectServerAsync(IServerSelector selector, CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterServerSelectionAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.AreSessionsSupportedAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.StartImplicitSessionAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)

We are using MongoDB.Driver version 2.14.1.

The heartbeats are also running just fine, even tough the app reports the timeouts:

MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-02.yyyy.mongodb.net:27017:6-155102]: sent heartbeat in 60059.3808ms.
    DateTime=2022-02-17T14:48:01.0366223Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-02.yyyy.mongodb.net:27017:6-155102]: sending heartbeat.
    DateTime=2022-02-17T14:48:01.0370828Z
MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-00.yyyy.mongodb.net:27017:2-155256]: sent heartbeat in 61040.5318ms.
    DateTime=2022-02-17T14:48:02.0370829Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-00.yyyy.mongodb.net:27017:2-155256]: sending heartbeat.
    DateTime=2022-02-17T14:48:02.0373158Z
MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-01.yyyy.mongodb.net:27017:5-161874]: sent heartbeat in 60914.9487ms.
    DateTime=2022-02-17T14:48:02.0385639Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-01.yyyy.mongodb.net:27017:5-161874]: sending heartbeat.
    DateTime=2022-02-17T14:48:02.0396348Z

Meanwhile the app is reporting timeouts, we can;

  • Connect to each of the 3 hostnames with telnet xxx 27017 from the host of the webapp.
  • We can connect, with the same connectionstring, from MongoDB Compass (different machine).
    so the server is responsive, and the network is available.

but we can only get the webapp running again by restarting it.

We have 3 different test environments, they all having the same problem.

Any suggestions, or know ways of doing further diagnostics?

Hi, @FrankNielsen,

Welcome to the MongoDB Community! Thank you for providing the exception with stack trace. I notice that your cluster topology contains no servers:

Servers : []

We have seen this problem in containerized environments due to a DNS issue. I suspect that you are running your app servers in Kubernetes, AKS, or similar environment.

The root cause of the problem is a bug in DnsClient.NET. DnsClient.NET is a third-party dependency that we use for SRV and TXT lookups. This bug has been resolved and will be included in the upcoming 2.15.0 release. See CSHARP-4001 for more information on the fix.

Prior to the release of the 2.15.0 driver, you can avoid this problem by using the mongodb:// connection string as A, AAAA, and CNAME lookups are not affected by this bug.

Sincerely,
James

1 Like

Hi @James_Kovacs ,

Thx for the reply, it was worth waiting for.

I have looked around your Jira but i am unable to find any roadmap for the v2.15.0 release, can you reveal anything - is it within weeks or months?

Cheers, Frank

Hi, @FrankNielsen,

Glad that I could be of assistance. We are wrapping up some features that we want to include in the 2.15.0 release. Insert usual disclaimer about forward-looking statements, plans could change due to unforeseen circumstances, etc., etc. With that said, the 2.15.0 release is weeks away, not months, though I don’t know for sure exactly when.

James

Hello,

Is there a fix for this issue? I am also facing the same issue. I am using AWS + DocumentDB + SSH Tunnel. When I trying to connect to

  1. Document DB from MongoDBCompass from my local laptop it works.
  2. When I use AWS Ubuntu EC2 for tunneling from my local laptop, I could see DocumentDB shell and I can execute the commands as well.

However when I try to connect to AWS DocumentDB from .Net application, I get below exception. I am using the same connection string in Compass and .Net code.

A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : “1”, ConnectionMode : “Direct”, Type : “Unknown”, State : “Disconnected”, Servers : [{ ServerId: “{ ClusterId : 1, EndPoint : “Unspecified/documentdb-test.cluster-c7jldipe45vl.ap-south-1.docdb.amazonaws.com:27017” }”, EndPoint: “Unspecified/documentdb-test.cluster-c7jldipe45vl.ap-south-1.docdb.amazonaws.com:27017”, ReasonChanged: “NotSpecified”, State: “Disconnected”, ServerVersion: , TopologyVersion: , Type: “Unknown”, HeartbeatException: “MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 172.31.27.19:27017
at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port)
at System.Net.Sockets.Socket.Connect(String host, Int32 port)
at MongoDB.Driver.Core.Connections.TcpStreamFactory.Connect(Socket socket, EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.TcpStreamFactory.CreateStream(EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.SslStreamFactory.CreateStream(EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelper(CancellationToken cancellationToken)
— End of inner exception stack trace —
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelper(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.Open(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.InitializeConnection(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.Heartbeat(CancellationToken cancellationToken)”, LastHeartbeatTimestamp: “2022-04-07T05:48:53.4181477Z”, LastUpdateTimestamp: “2022-04-07T05:48:53.4241461Z” }] }.

Pleas help.

Thanks,
Yogesh

A short follow up, since mognodb driver release v2.15.0 we have not experienced any problems running with the SRV connection string format.

Cheers, Frank

PS: @Yogesh_Satpute i think your problem is a general first time connection problem!?

The exception is there while using 2.17.1 version of the driver, and following the connection string format given by Atlas MongoDB (by checking the Connect->Connect from Application option). I am also trying to get a solution for this exception.
Following is the exception:
A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : “1”, Type : “Unknown”, State : “Disconnected”, Servers : }.

In this exception it is mentioned that the type is unknown, state is disconnected, server list is empty etc). The servers etc are not mentioned as part of the connection string which I obtained from the AtlasDB->Connect option. I do not know how to proceed and what to do at the moment.

1 Like

Hello, we also use the 2.17.1 version and have the same problem.

Some things to check:

  • Are you able to connect via mongosh or another driver?
  • If mongodb+srv:// is failing, are you able to connect via the mongodb:// style connection string?
  • Have you verified that your network allow list includes your app server (or workstation)?
  • Are you using AWS PrivateLink or similar technology? If so, have you tried disabling it and connecting? (Not saying you shouldn’t use PrivateLink, but trying to isolate whether the problem is with your network configuration or something else.)

Given this sounds like a network connectivity/configuration issue rather than a driver bug, I would recommend reaching out to our Atlas Support Team by clicking the in-app chat icon in the lower right corner of MongoDB Atlas.

Sincerely,
James

1 Like

Anytime a “timeout” error occurs, the first thing to check should be the “network access” control on the server. Most of the time, a timeout is related to an improperly set IP list. to test this possibility, login to your Atlas cluster (or find your private server config files) and head to the security section to allow access from anywhere and test the app that way. also check other security settings so you can at least connect with mongo or mongosh from console, or with Compass.

Next is the IP address of the host pc of the application. Having a static domain name does not guarantee having a static IP. your cloud provider might be changing the IP address of the host where your app resides if you don’t have a static IP subscription. they are, after all, virtual PCs or containers and will most possibly be restarted with different IP addresses regularly (disaster recovery, load balancing, etc.). If you have set strict network access in your MongoDB server, then your app will have no chance but give a timeout. In case you host the app on a PC in your own infrastructure, then make sure you give static IP to that PC.

if you can eliminate these two possibilities, please add these details to your posts so help can find you faster.

1 Like

The IP is given as 0.0.0.0 inside Atlas Mongo DB network access option.
I can not set static ip because of work place restrictions. Also there are no proxy settings enforced.
The strange thing now is that the access to MongoDB in Atlas is possible since Friday. There has been no changes in the source code, it is the same as when I posted the issue in this forum. It is all strange to me. Does it have anything to do with the fact that I am using shared/free server/cluster option hosted somewhere, and there could be network congestion due to multiple requests from different places to this setup?

“access anywhere” kinda removes all restrictions but there are still many ways to cripple the network access on the app side (before finally blaming a driver). your app seems to work now but may stop again, so best to understand where else the network may diverge expectations. unfortunately, it is like debugging a program and it won’t be apparent immediately what causes the problem.

you have said “since Friday”, can you please test your app today/tomorrow again!? it now is pretty possible that your network has a week-long restriction on ports so employees would not surf distracting websites. If your app will again give timeouts today and tomorrow, you need to speak to your administration.

Thank you for the reply.
There was no intention to blame anything or anyone, thought this was a support forum.
I already explained there are IT restrictions as per company policy, but do not know up to what extent and where all those restrictions are. Had I known, then I did not have to think much.

I close my further communication on this topic here.

@Developer_BB Please take a breath first. I am sorry you have offended, but I had/have no intention to offend anyone here.

I do not have any way to test your own app and infrastructure on-premise, so I was trying to say what else you can test. we need to first eliminate probable causes, then check for unexpected ones.