MongoDB.Driver for C# can not connect to Single Node Replica Sets

Bastian_Topfer · November 20, 2024, 9:57am

I will shortly link the Corresponding Jira Ticket. A Pull Request with Changes to get it working again, i will create on github shortly, under my personal Account.

Steps to Reproduce:

Create a Single Node Replica Set for Testing Locally (Easiest for Repro: TestContainers)
Try to Connect to it for some Sort of Read/Write Operations without a Direct Connection From a C# Project i.e. the Attached Repro Project(Our Usecase, does not want Direct Connection, bcs we want to use transactions, and have integration Tests for those)
in our Repro Project, you will be prompted to input something or nothing, other than ‘q’
Wait 30 seconds until the Connection Times out

Now i already spend the Debugging time to figure it out and a PR will follow shortly.

But here is what happened:
Step 1:

SingleNodeReplicaSet gets registered as MultiServerCluster in the Driver, this leads to issues, where the internal IpAddress, will be tried to be used for connecting (i.e. 127.0.0.1) instead of the host-side port/address.

This Here we can find why, when comparing the ClusterFactory from i.e. 2.26.0 with 3.0.0. The Condition of assigning a SingleNodeCluster, in case Endpoints.Count == 1, is gone.

Step2:

After we reinsert this condition, it will fail, because ‘Direct Connection is not supported on SingleServerClusters’) Strange… Here we have an Assertion, that is inverted, from what it’s error Message sais (It Asserts, that DirectConnection Should be True in 3.0.0) - Invert this back to match, it’s Description and move on.

Step 3:

We fail on a read… why you ask? our ClusterType == Unknown, so the ReadPreferenceServerSelector always returns an Empty Server List. So our server Selection endlessly loops. Why? in 2.26 there was a statement in the ClusterDescription Update Handler of the SingleServerCluster, that would set the ClusterType based on the ServerType of it’s Server. This was gone.

So we reinsert this Tada. Now we can interact with a SingleNodeReplica again.

As i am completely new to this Codebase, i can’t tell if any secrets are yet missing and will bite me/us again.
Yet i am sure many a team tried upgrading to the 3.0.0 Driver, saw all their integration Tests fail, and reverted their update. So i hope this will be patched in a public release soon.

In the meanwhile i will create a internal patch Version and Create a Pull Request Referencing this Ticket with my Solution.

Will Add a Link to the Jira Ticket.

Bastian_Topfer · November 20, 2024, 12:41pm

Pull Request belonging to this Topic, created on my Personal Github: CSHARP-5419: Fix Single Node Replica Set Handling in the Driver by MrHPotter · Pull Request #1551 · mongodb/mongo-csharp-driver · GitHub

papafe · November 20, 2024, 12:49pm

Hi @Bastian_Topfer, thanks for your message, I’ll take a look at your PR

papafe · November 20, 2024, 6:47pm

Hey @Bastian_Topfer.

In version 3.0.0 of the driver we changed the logic a little bit to be more compliant with the SDAM specification and this could have caused the change in behaviour you’re experiencing.
I noticed that you’re running an instance of mongo under Docker, and in this case there could be some issues with replica sets because some of the replica sets are hidden behind docker’s virtual network. This is similar to what was reported in another JIRA ticket.

The solution here is to use DirectConnection and you can verify yourself that adding "?directConnection=true" to your connection string will solve the issue. I can see that you would not like to use DirectConnection. Is there any particular reason for that?

Bastian_Topfer · November 21, 2024, 8:32am

Hi @papafe Yes, when i tried using Direct Connections instead, our Integrationtests using Transactions failed. This was our whole purpose for needing Single Node ReplicaSets locally.

Now tho with my patch on the Driver we are timing out, when trying to Connect to an Actual ReplicaSet inside our Kubernetes Cluster. Looks like the ServerSelection Loop again, coming back without Results.

Bastian_Topfer · November 21, 2024, 8:57am

Ok i actually just tested this with MongoSandbox 1.0.1 instead of Testcontainers and 3.0.0 of the Driver, which also uses Direct Connections implicitly with Replica Sets. and even our Tests with Transactions passed. SO im really sorry for the false alarms. I think my Jira Ticket and PR can be closed.

I think a bit more clarity in catching Configuration Errors like this would be nice. And from a Code view, registering SNRs as MultiServerClusters might be a bit misleading.

One Concrete thing i’d have to offer here would be adjusting at least the Ensure Error Message on the Single Server Cluster’s Constructor, as it conflicts with the Ensure, that it executes.

papafe · November 21, 2024, 9:05am

Hey @Bastian_Topfer, I’m glad you managed to fix the issues you were encountering. And I agree we should probably fix that message, thank you!

Pascal_Bourque · November 28, 2024, 11:16pm

Hello,

We are facing the exact same issue. We use a single-node replicaset for our integration tests so that we can test code that uses Mongo transactions. Since updating to v3 of the C# driver, our code times out trying to connect to the snrs unless we provide the directConnection argument.

Why did you close the PR exactly?

Pascal

papafe · November 29, 2024, 4:55am

Hi @Pascal_Bourque,

Do you also have an instance of Mongo under Docker? We got some issues in the past due to the way docker hides its internal network, so in this case using directConnection=true is recommended.

krasnikov.vlad.v · January 14, 2025, 2:58pm

Hi, I have absolutely the same issue.
I was sitting on 2.24.0 and my MongoDB testcontainers worked like a charm with a following setup:
_mongoDbContainer = new MongoDbBuilder().WithReplicaSet().Build();
and

var mongoClient = new MongoClient(_mongoDbContainer.GetConnectionString());
services.AddSingleton<IMongoClient>(mongoClient);
var mongoDbSettings = _configuration.GetSection("MongoSinkSettings");
services.AddSingleton(provider =>
{
    var client = provider.GetRequiredService<IMongoClient>();
    var database = client.GetDatabase(_mongoDatabaseName);

    return database;
});

But since update to 3.1 of driver and bson libraries I start getting 30 seconds timeouts.
The testcontainer itself still works like a charm and I even connected to it’s instance via Studio 3T, but from the code it doesn’t work. I tried lots of stuff like manually setting tls settings and api version but nothing really worked.

Assuming the fact that after rollback everything works again I believe the issue is within some breaking changes in driver internals. Could you please take a look at it? It feels like the topic starter had the same issue.

papafe · January 14, 2025, 3:21pm

Hi @krasnikov.vlad.v.

Have you tried using direct connection? You can do that by adding "?directConnection=true" to your connection string.