Mongodv+srv connection time-out when connecting from AKS (Azure Kubernetes Service)

I have an Atlas cluster and I am connecting from my dotnet application using MongoDB C# Driver 2.11.2
The connection string is default mongodb+srv://<user>:<pass>@cluster0.<cluster>.mongodb.net

When launching the app locally everything works fine.

When I deploy it to AKS in 90% cases it failes to connect with a time-out error:

System.TimeoutException: A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [] }.

If I restart a Pod multiple times it will work eventually. So it looks like some transient error.

When I switch to mongodb:// scheme everything works fine in AKS. What is the possible reason of such behaviour in the cloud?

We have been facing the same issue for the past 2 weeks. Thanks to your comments, we were able to resolve the issue by replacing the SRV scheme to mongodb:\

The issue is resolved after that. Following are the specs:
.Net5 APIs using .Net Mongodb driver V2.14.1.

Mongdb Atlas M10.

We are not facing this issue in our QA or UAT environments. The only difference is that on these environments we are using free and M2 tier of Atlas.

Still need to know the cause of this recurring issue by the Mongodb Atlas team.

We have seen the same thing recently. For us it looks like a combination of the c# drivers, SRV scheme and AKS. Thanks for the tip about changing the connection strings to use the mongodb:// format, this gives us a workaround although clearly it is less flexible should we decide to add or remove members.

The really annoying this is that it is intermittent so sometimes it looks like all is ok, then it breaks again.

It feels like a DNS issue in the AKS stack.

Folks can you share what version of AKS you’re experiencing this issue on? I’d like to make sure we share this back with the Microsoft / AKS team

Thanks
-Andrew

AKS Kubernetes version: 1.21.7
MongoDB C# Driver: 2.11.2
Dotnet SDK: 5.0
Mongo Atlas: M10

There’s a chance the underlying issue is https://jira.mongodb.org/browse/CSHARP-4001

1 Like

Good catch, Simon: for any readers, in the interim you can always use the non-SRV connection string as a temporary workaround assuming this is indeed the issue

Resolved in c# driver 2.15.0

1 Like

We are still experiencing this issue on one of our deployments despite upgrading to 2.15.0 (C#). We are using the SRV connection string and are running on AKS (1.22.4).

A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : “1”, ConnectionMode : “ReplicaSet”, Type : “ReplicaSet”, State : “Disconnected”, Servers : [] }.

We have some identical deployments in other regions that are running just fine.

Any idea what the problem might be?

Interesting, does using the legacy style non-SRV connection string (shows under older drivers in the Atlas cluster connect UI) potetntially solve the issue? if yes there has been a long-running Azure DNS resolver issue for SRV addresses that are long: I’d love it if you’d escalate this with your Azure point of contact

Yes, using the legacy style non-SRV connection string solved the problem.

Thank you for confirming: if you could let the Azure team know that this SRV limitation got in your way it would help us get them to prioritize this