Not able to connect to multiple replicaset members in a shard

  • We are facing problem in connecting to replicaset members within a shard. We do this to read oplogs from them.
  • This is how we are doing it currently. We listShards() for a particular shard and then note all the replicaset members inside it. Then we build a connectionString( mongodb://mongo0.example.com,mongo1.example.com,mongo2.example.com ).
  • We finally connect using the below script
MongoClientSettings settings = MongoClientSettings.builder()
                .readPreference(READ_PREFERENCE)
                .applyConnectionString("mongodb://mongo0.example.com,mongo1.example.com,mongo2.example.com")
.build();

com.mongodb.client.MongoClient mongoClient= MongoClients.create(settings);

On running a build info query ( mongoClient.getDatabase("test_db").runCommand(new Document("buildinfo", 1)).getString("version") ) we are getting timeout error as below

Caused by: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, servers=[{address=mongo0.example.com:27010, type=REPLICA_SET_SECONDARY, TagSet{[Tag{name='use', value='prod1'}]}, roundTripTime=1.7 ms, state=CONNECTED}, {address=mongo1.example.com:27010, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketException: mongo1.example.com:27010}, caused by {java.net.UnknownHostException: mongo1.example.com:27010}}]

Stack trace is as below:

at com.mongodb.internal.connection.BaseCluster.createTimeoutException(BaseCluster.java:408)
        at com.mongodb.internal.connection.BaseCluster.selectServer(BaseCluster.java:123)
        at com.mongodb.internal.connection.AbstractMultiServerCluster.selectServer(AbstractMultiServerCluster.java:54)
        at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:110)
        at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:106)
        at com.mongodb.binding.ClusterBinding.getReadConnectionSource(ClusterBinding.java:93)
        at com.mongodb.client.internal.ClientSessionBinding.getReadConnectionSource(ClientSessionBinding.java:82)
        at com.mongodb.operation.OperationHelper.withReadConnectionSource(OperationHelper.java:461)
        at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:203)
        at com.mongodb.operation.CommandOperationHelper.executeCommand(CommandOperationHelper.java:198)
        at com.mongodb.operation.CommandReadOperation.execute(CommandReadOperation.java:59)
        at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:196)
        at com.mongodb.client.internal.MongoDatabaseImpl.executeCommand(MongoDatabaseImpl.java:194)
        at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:163)
        at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:158)
        at com.mongodb.client.internal.MongoDatabaseImpl.runCommand(MongoDatabaseImpl.java:148)

When we check the logs, we can see below logs as well

Caused by: class java.net.UnknownHostException: mongo1.example.com: **Name or service not known**
        java.net.Inet4AddressImpl::lookupAllHostAddr(Inet4AddressImpl.java::-2)
        java.net.InetAddress$PlatformNameService::lookupAllHostAddr(InetAddress.java::929)
        java.net.InetAddress::getAddressesFromNameService(InetAddress.java::1519)
        java.net.InetAddress$NameServiceAddresses::get(InetAddress.java::848)
        java.net.InetAddress::getAllByName0(InetAddress.java::1509)
        java.net.InetAddress::getAllByName(InetAddress.java::1368)
        java.net.InetAddress::getAllByName(InetAddress.java::1302)
        com.mongodb.ServerAddress::getSocketAddresses(ServerAddress.java::203)
        com.mongodb.internal.connection.SocketStream::initializeSocket(SocketStream.java::75)
        com.mongodb.internal.connection.SocketStream::open(SocketStream.java::65)
        com.mongodb.internal.connection.InternalStreamConnection::open(InternalStreamConnection.java::128)
        com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable::run(DefaultServerMonitor.java::117)
        java.lang.Thread::run(Thread.java::834)

and we can also see this log multiple times

Canonical address mongo0.example.com:27010 does not match server address. Removing mongo2.example.com:27010 from client view of cluster

This problem doesn’t occur on Atlas sharded clusters. Also, for non-atlas sharded clusters, if we try to connect to only the primary node it succeeds where as it fails while connecting to multiple nodes using connectionString and connectionTags

Java driver version: 3.12

Hi @Nachiket_G_Kallapur and welcome to the MongoDB community!!

For better understanding of the problem, it would be very helpful if you could provide the following information:

  1. When you state that this works with an Atlas sharded cluster, are you following the exact same steps to get the connection string for both Atlas and the on-prem deployment?

  2. Is there a specific use case regarding the method you are attempting to perform the connection? You could possibly be using change streams.

  3. Are the hostnames mentioned in the connection string in your post example hostnames or the actual hostnames used in the connection attempts?

  4. MongoDB version being used.

Please help us with the above information for further understanding.

Thanks
Aasawari

Hi @Aasawari , thanks for your reply.
The below are the answers to your follow up questions

  1. We use exact same steps in connecting to all types of MongoDB deployments

  2. We are thinking to do so but wanted to know why are we facing this problem with reading oplogs from sharded cluster

  3. In our code we use the exact same hosts mentioned in the listShards query to build the connection String. The hostnames I provided are not the original ones.

  4. MongoDB version=4.2.14, java driver version 3.12, privateLink connection

I have read multiple articles stating that privateLink is not stable on 3.12. If you feel privateLink is the actual cause of the issue, then I would want to know why are we succeeding to connect to a single replicaset memeber whereas failing in connecting to multiple of them in the shard

Thank you

Hi @Nachiket_G_Kallapur

Thank you for sharing the responses.
Since the exact same deployment and connection string works in Atlas and does not work in non Atlas sharded cluster, there is a possibility that Atlas has a network setting configured in such a way that the latter deployment does not have.

Official MongoDB drivers specifications requires official drivers to connect to all nodes in a replica set for monitoring and high availability (see Server Monitoring). Since you’re using AWS PrivateLink, you would need to setup the PrivateLink to allow the driver to connect to all members of the replica set. I believe this is the reason behind the java.net.UnknownHostException error.

Please have a look at the documentation page below for understanding how the security is configured in Atlas and that might be helpful in understanding how you deployment has been configured.

AWS PrivateLink is supported in Atlas (see security private endpoints and MongoDB Atlas data plane with AWS privateLink so if you are able to use Atlas, it might be able to simplify this setup and operation for you.

Let us know if you have any further queries.

Regards
Aasawari