Timeout when selecting a server after inactivity

Hi,

We are using mongodb atlas using a mongodb+srv connection. When our webapp is starting up and handling requests - everything is working just fine. When our webapp gets inactive/idle a few hours we a getting timeouts on new requests:

System.TimeoutException: A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [] }.    
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)    
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)    
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedAsync(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)    
at MongoDB.Driver.Core.Clusters.Cluster.SelectServerAsync(IServerSelector selector, CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterServerSelectionAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.AreSessionsSupportedAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoClient.StartImplicitSessionAsync(CancellationToken cancellationToken)    
at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)

We are using MongoDB.Driver version 2.14.1.

The heartbeats are also running just fine, even tough the app reports the timeouts:

MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-02.yyyy.mongodb.net:27017:6-155102]: sent heartbeat in 60059.3808ms.
    DateTime=2022-02-17T14:48:01.0366223Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-02.yyyy.mongodb.net:27017:6-155102]: sending heartbeat.
    DateTime=2022-02-17T14:48:01.0370828Z
MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-00.yyyy.mongodb.net:27017:2-155256]: sent heartbeat in 61040.5318ms.
    DateTime=2022-02-17T14:48:02.0370829Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-00.yyyy.mongodb.net:27017:2-155256]: sending heartbeat.
    DateTime=2022-02-17T14:48:02.0373158Z
MongoDB-SDAM Verbose: 405 : connection[1:xxxxxxxxx-dev-shard-00-01.yyyy.mongodb.net:27017:5-161874]: sent heartbeat in 60914.9487ms.
    DateTime=2022-02-17T14:48:02.0385639Z
MongoDB-SDAM Verbose: 404 : connection[1:xxxxxxxxx-dev-shard-00-01.yyyy.mongodb.net:27017:5-161874]: sending heartbeat.
    DateTime=2022-02-17T14:48:02.0396348Z

Meanwhile the app is reporting timeouts, we can;

  • Connect to each of the 3 hostnames with telnet xxx 27017 from the host of the webapp.
  • We can connect, with the same connectionstring, from MongoDB Compass (different machine).
    so the server is responsive, and the network is available.

but we can only get the webapp running again by restarting it.

We have 3 different test environments, they all having the same problem.

Any suggestions, or know ways of doing further diagnostics?

Hi, @FrankNielsen,

Welcome to the MongoDB Community! Thank you for providing the exception with stack trace. I notice that your cluster topology contains no servers:

Servers : []

We have seen this problem in containerized environments due to a DNS issue. I suspect that you are running your app servers in Kubernetes, AKS, or similar environment.

The root cause of the problem is a bug in DnsClient.NET. DnsClient.NET is a third-party dependency that we use for SRV and TXT lookups. This bug has been resolved and will be included in the upcoming 2.15.0 release. See CSHARP-4001 for more information on the fix.

Prior to the release of the 2.15.0 driver, you can avoid this problem by using the mongodb:// connection string as A, AAAA, and CNAME lookups are not affected by this bug.

Sincerely,
James

1 Like

Hi @James_Kovacs ,

Thx for the reply, it was worth waiting for.

I have looked around your Jira but i am unable to find any roadmap for the v2.15.0 release, can you reveal anything - is it within weeks or months?

Cheers, Frank

Hi, @FrankNielsen,

Glad that I could be of assistance. We are wrapping up some features that we want to include in the 2.15.0 release. Insert usual disclaimer about forward-looking statements, plans could change due to unforeseen circumstances, etc., etc. With that said, the 2.15.0 release is weeks away, not months, though I don’t know for sure exactly when.

James

Hello,

Is there a fix for this issue? I am also facing the same issue. I am using AWS + DocumentDB + SSH Tunnel. When I trying to connect to

  1. Document DB from MongoDBCompass from my local laptop it works.
  2. When I use AWS Ubuntu EC2 for tunneling from my local laptop, I could see DocumentDB shell and I can execute the commands as well.

However when I try to connect to AWS DocumentDB from .Net application, I get below exception. I am using the same connection string in Compass and .Net code.

A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : “1”, ConnectionMode : “Direct”, Type : “Unknown”, State : “Disconnected”, Servers : [{ ServerId: “{ ClusterId : 1, EndPoint : “Unspecified/documentdb-test.cluster-c7jldipe45vl.ap-south-1.docdb.amazonaws.com:27017” }”, EndPoint: “Unspecified/documentdb-test.cluster-c7jldipe45vl.ap-south-1.docdb.amazonaws.com:27017”, ReasonChanged: “NotSpecified”, State: “Disconnected”, ServerVersion: , TopologyVersion: , Type: “Unknown”, HeartbeatException: “MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server. —> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 172.31.27.19:27017
at System.Net.Sockets.Socket.Connect(IPAddress[] addresses, Int32 port)
at System.Net.Sockets.Socket.Connect(String host, Int32 port)
at MongoDB.Driver.Core.Connections.TcpStreamFactory.Connect(Socket socket, EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.TcpStreamFactory.CreateStream(EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.SslStreamFactory.CreateStream(EndPoint endPoint, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelper(CancellationToken cancellationToken)
— End of inner exception stack trace —
at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelper(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Connections.BinaryConnection.Open(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.InitializeConnection(CancellationToken cancellationToken)
at MongoDB.Driver.Core.Servers.ServerMonitor.Heartbeat(CancellationToken cancellationToken)”, LastHeartbeatTimestamp: “2022-04-07T05:48:53.4181477Z”, LastUpdateTimestamp: “2022-04-07T05:48:53.4241461Z” }] }.

Pleas help.

Thanks,
Yogesh

A short follow up, since mognodb driver release v2.15.0 we have not experienced any problems running with the SRV connection string format.

Cheers, Frank

PS: @Yogesh_Satpute i think your problem is a general first time connection problem!?

The exception is there while using 2.17.1 version of the driver, and following the connection string format given by Atlas MongoDB (by checking the Connect->Connect from Application option). I am also trying to get a solution for this exception.
Following is the exception:
A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : “1”, Type : “Unknown”, State : “Disconnected”, Servers : }.

In this exception it is mentioned that the type is unknown, state is disconnected, server list is empty etc). The servers etc are not mentioned as part of the connection string which I obtained from the AtlasDB->Connect option. I do not know how to proceed and what to do at the moment.

1 Like

Hello, we also use the 2.17.1 version and have the same problem.

Some things to check:

  • Are you able to connect via mongosh or another driver?
  • If mongodb+srv:// is failing, are you able to connect via the mongodb:// style connection string?
  • Have you verified that your network allow list includes your app server (or workstation)?
  • Are you using AWS PrivateLink or similar technology? If so, have you tried disabling it and connecting? (Not saying you shouldn’t use PrivateLink, but trying to isolate whether the problem is with your network configuration or something else.)

Given this sounds like a network connectivity/configuration issue rather than a driver bug, I would recommend reaching out to our Atlas Support Team by clicking the in-app chat icon in the lower right corner of MongoDB Atlas.

Sincerely,
James

1 Like

Anytime a “timeout” error occurs, the first thing to check should be the “network access” control on the server. Most of the time, a timeout is related to an improperly set IP list. to test this possibility, login to your Atlas cluster (or find your private server config files) and head to the security section to allow access from anywhere and test the app that way. also check other security settings so you can at least connect with mongo or mongosh from console, or with Compass.

Next is the IP address of the host pc of the application. Having a static domain name does not guarantee having a static IP. your cloud provider might be changing the IP address of the host where your app resides if you don’t have a static IP subscription. they are, after all, virtual PCs or containers and will most possibly be restarted with different IP addresses regularly (disaster recovery, load balancing, etc.). If you have set strict network access in your MongoDB server, then your app will have no chance but give a timeout. In case you host the app on a PC in your own infrastructure, then make sure you give static IP to that PC.

if you can eliminate these two possibilities, please add these details to your posts so help can find you faster.

1 Like

The IP is given as 0.0.0.0 inside Atlas Mongo DB network access option.
I can not set static ip because of work place restrictions. Also there are no proxy settings enforced.
The strange thing now is that the access to MongoDB in Atlas is possible since Friday. There has been no changes in the source code, it is the same as when I posted the issue in this forum. It is all strange to me. Does it have anything to do with the fact that I am using shared/free server/cluster option hosted somewhere, and there could be network congestion due to multiple requests from different places to this setup?

“access anywhere” kinda removes all restrictions but there are still many ways to cripple the network access on the app side (before finally blaming a driver). your app seems to work now but may stop again, so best to understand where else the network may diverge expectations. unfortunately, it is like debugging a program and it won’t be apparent immediately what causes the problem.

you have said “since Friday”, can you please test your app today/tomorrow again!? it now is pretty possible that your network has a week-long restriction on ports so employees would not surf distracting websites. If your app will again give timeouts today and tomorrow, you need to speak to your administration.

Thank you for the reply.
There was no intention to blame anything or anyone, thought this was a support forum.
I already explained there are IT restrictions as per company policy, but do not know up to what extent and where all those restrictions are. Had I known, then I did not have to think much.

I close my further communication on this topic here.

@Developer_BB Please take a breath first. I am sorry you have offended, but I had/have no intention to offend anyone here.

I do not have any way to test your own app and infrastructure on-premise, so I was trying to say what else you can test. we need to first eliminate probable causes, then check for unexpected ones.

Have you resolved this issue. I have struggling to find solution for this.
I am facing the same issue in aws lambda. Please find the details below:

Application details: .NET Core application (lambda). .NET mongodb driver (2.18.0), Serverless mongodb instance.

Personal Laptop or Local environment
I have .NET core lamda application and was not working, so I changed DNS to google. It started working in local laptop.

Deployed Lambda function
When I deployed the application to aws lambda, it is not working. I have added DataAccess and network access also in Atlas.
image

I tried to connect to serverless using mongosh in Cloud 9 machine from the same VPC where lambda is condfigured. I am able to connect to serverless instance.

Please help me why I am unable to connect to Serverless mongo instance from lambda.

Please let me know if you need more details.
@James_Kovacs

Have you find the solution for this problem?

I’ve been struggling with this for months now. Finally got it working locally.

My company has an AWS hosted MongoDB Atlas cluster which I’m connecting to via office VPN, and I’m running an ASP.NET 6 web application with version 2.22.0 MongoDB C# driver.

  • Connecting from my local MongoDB Compass to the Atlas cluster? No problem.
  • Connecting from my local application to a locally running MongoDB server? No problem.
  • Connecting from my local application to the Atlas cluster? Random and frequent server selection timeout failures during initialization, followed by pretty smooth performance after the application successfully breaks through during a re-attempt.

The problem doesn’t seem to be affecting our production environment. That or my logging configuration is way worse than I thought.

Anyway, I re-configured my local user secrets to use the legacy “2.4 or later” connection string format obtained from Atlas (the one without “+srv” in it). I’m no longer experiencing poor initialization performance when running my app locally.

Working connection string format:

  • mongodb://username:password@shard1hostname:27017,shard2hostname:27017,shard3hostname:27017/?ssl=true&replicaSet=replicaSetName&authSource=admin&retryWrites=true&w=majority

Hope this helps someone. Good luck.

We are facing the same issue with a WPF app on .net Framework 4.7.2 with MongoDB C# driver 2.22.0.

  • Connection from Compass or Studio3T works
  • Connection from our Python apps works
  • Connecting to a local MongoDB works
  • Connecting from our WPF application to our Atlas clusters randomly works.

Best case the initialization of a new MongoClient() takes about 15s, but then the connection works. Sometimes the new MongoClient() takes about 400ms but then the first query to the cluster raises an error:

System.TimeoutException
  HResult=0x80131505
  Message=A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServerSelector{ ReadPreference = { Mode : Secondary } }, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "2", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [] }.
  Source=MongoDB.Driver.Core

The error seems to be mostly unaffected by cluster size (M10, M40 and M50 tested).
The workaround by @Johann_Sigurdsson works for now, however this needs fixing.

1 Like

Hi, @Johann_Sigurdsson and @Hermann_Baumgartl,

I understand that you’re having connectivity problems when using mongodb+srv:// connection strings. The .NET/C# Driver resolves the associated SRV and TXT records and then resolves the hostnames returned from the SRV response. (The TXT record contains additional options such as replicaSet, authSource, and loadBalanced.) Under the hood, the driver relies on DnsClient.NET to resolve SRV and TXT records and relies on the OS’s DNS resolver for FQDN resolution.

Examining the TimeoutException, I can see that Servers : [], which means that the topology is empty. This can occur if the replica set name in the TXT record does not match the replica set name returned during heartbeats. You can view the replica set name in the TXT record via:

dig -t TXT CLUSTERNAME.PROJECTID.mongodb.net

For example, if your connection string is mongodb+srv://user:pwd@my-cluster-name.9zzzz.mongodb.net/?retryWrites=true&w=majority, then you can run the following command to find the associated TXT record:

dig -t TXT my-cluster-name.9zzzz.mongodb.net

When you connect to your cluster using mongosh, you can run db.hello() to manually heartbeat with a cluster node. This will return the replica set name as setName.

The replica set name in the TXT record must match the replica set name returned by cluster members. If your connection string says that you’re connecting to replicaSet=atlas-abc123-shard-0 but the nodes say that they’re part of setName:atlas-xyz987-shard-0, then those nodes will be removed from the topology since they belong to the wrong replica set.

Replica set names are configured automatically in Atlas and are not modifiable by users. They are also not updated after cluster creation. Thus it is unexpected that the replica set name in the TXT record and returned by the cluster members would differ.

There are other edge case scenarios where a misconfigured cluster can result in an empty topology, but these misconfigurations aren’t possible in Atlas. These scenarios would require reusing hostnames for new clusters, which Atlas does not do.

I have attempted to reproduce the reported behaviour but have so far been unsuccessful in observing the problem. If you have a self-contained repro that reproducibly elicits the behaviour, I would be happy to investigate further. Thanks in advance for any additional information that you can provide.

Sincerely,
James

1 Like

Hi, @James_Kovacs

at least for our application, the error cannot be reproduced with 100% “success” rate. In our app the MongoClient is created when the user activates a certain UserControl as singleton. Usually the creation of the MongoClient is slow (about 30s), but the queries work. If it takes 30s the connection/client works. If the new MongoClient(connectionString) takes less than 1s, it is safe to say that the first query throws an TimeoutException. However, we can reproduce the error reliably by closing the app, wait for 1 minuten and opening the app again. It’s a ~15% chance of getting the error within 10 minuten.

I reproduced the error and followed your instructions on comparing the TXT record and the db.hello() response. setName from db.hello() and the replicatSet from nslookup are the same.

We also opened a support case, where we could share some more information including an example of the code used. The case number is: 01225910. I also attached a minimum example to reproduce the error in this case.

Best regards,
Hermann