Intermediate MongoTimeoutError: Server selection timed out after 30000 ms

We are seeing intermediate issue of pod crashing with error stating MongoTimeoutError: Server selection timed out after 30000 ms when it is trying to connect mongoDb. On replacing the pod it is able to connect successfully. If we again try to replace the running pod we see above error. It is intermediate, could anyone help us here. Mongoose version used is 5.8

When you say pod are you referring to K8s?

yes @Andrew_Davidson

We have whitelist of all ip via 0.0.0.0 is in place and we use a m30 cluster

Below is the code snippet which we are using to connect mongo atlas

mongoose version: “mongoose”: “5.8.7”
mongo: 4.2.21

<<<<< code >>>>>

var options = {
useNewUrlParser: true,
useUnifiedTopology: true,
useFindAndModify:false,
connectTimeoutMS: 300000,
socketTimeoutMS:0 ,
keepAlive: 1
};

mongoose.connect(“connection string”, options, function (err) {
if (err) {
logger.critical("Error while connecting to mongoDB … ", err);
} else {
logger.info(“mongodb connected successfully.”)
}
});

let db = mongoose.connection;
db.once('open', function () {
  logger.info('MongoDB event open');
  db.on('connected', function () {
    logger.info('MongoDB event connected');
  });

  db.on('error', function (err) {
    logger.info('MongoDB event error.: ' + err);
  });
});

Tested above code with mongoose version 6.5.x with the above code snippet

var options = {“ssl”:true,“sslValidate”:false,“useNewUrlParser”:true,“useUnifiedTopology”:false,“keepAlive”:1,“connectTimeoutMS”:300000,“socketTimeoutMS”:0}

Error while connecting to mongoDB … may be down Could not connect to any servers in your MongoDB Atlas cluster. One common reason is that you’re trying to access the database from an IP that isn’t whitelisted. Make sure your current IP address is on your Atlas cluster’s IP whitelist: https://docs.atlas.mongodb.com/security-whitelist

MongoDB event error.: MongooseServerSelectionError: Could not connect to any servers in your MongoDB Atlas cluster. One common reason is that you’re trying to access the database from an IP that isn’t whitelisted. Make sure your current IP address is on your Atlas cluster’s IP whitelist: https://docs.atlas.mongodb.com/security-whitelist/

Basically with both version 5.8 and 6.5 we are facing similar issue just that error message are different , could any one help us here who has faced these issues

It’s not going to be any kind of configuration if it’s intermittent like this.

Are there any other suspicious events in the time before it crashes? Have you check the logs for the pod?

@Dan_Mckean we don’t see any suspicious things , we are only seeing below error in the log


Error while connecting to mongoDB
may
be down Could not connect to any servers in your MongoDB Atlas cluster.
One common reason is that you’re trying
to
from an IP that isn’t whitelisted. Make sure vour current IP address is on vour Atlas cluster’s IP whitelist: httos://docs.atlas.monaodb.com/securitv-whiteli
14:05:26.446Z] [INFO]: [BPA] [cisco-bpa-platform/mw-backup-restorel [ServerName: bpa-ns] [PodName: backup-restore-service-5fb4d8f4dc-bm9vv] [session-id: 1 [c
10DB event error.: MonaooseserverselectionError: Could not connect to anv servers in vour MonaoDB Atlas cluster. One common reason

We try to open multiple concurrent connections is that any concerns here ?

I think it probably depends on how many multiple is…! I’m not sure of the limits but I’m sure it’s fine for sensible numbers.

My suggestion would be to open a support case (top right in Atlas) and see if our support folks can help take a look at things from the Atlas end of things. Perhaps we have some logs on that end.

@Dan_Mckean @Andrew_Davidson

On further investigations below are the observations

Since this issue is reproducible all over the time, we tried packet capture between working pod and non working pod [both of same deployment version], the observation here is in the non working scenario, after the client makes TCP connection it tries to perform TLS client hello but the pod is not getting reply from the mongodb server resulting in timeout and repeated retries. In all retries the TLS client hello is sent but mongodb server is not replying back.

We further did a deep dive analysis on the packet capture and found that, when the client is making “Client Hello”, for working pods its taking TLSv1.2 protocol and in a non working its using TLSv1 protocol layer…

It is very intermediate, application fails to get connection with TLS1.0 , but with TLS1.2 is able to connect successfully

In mongo db atlas we have set TLS 1.0 and above , still the pod crashes when at run time TLS1.0 gets used.

Could someone help us here , we are clueless how to go ahead

Good find :slight_smile:

Is there any possibility of configuring your application to consistently use TLS1.2? That would be the ideal for security reasons.

And did you enable it by editing the deployment configuration and changing it as follows?

But yes, in theory Atlas can still support TLS1.0 - if that’s been enabled in the cluster settings (as above) but it’s not being accepted for those incoming connections I’d suggest opening a support case (top right in Atlas) and see if our support folks can help take a look at things from the Atlas end of things.

1 Like

@Dan_Mckean

yes we have enabled deployment configuration from day 1 here is screen shot for the same

In that scenario it should work right with TLS1.0, and from the application we are not setting TLS version, could be mongoose driver [5.8.7] thats using it at runtime. Is there a way we could set TLS version when we try to connect via mongoose?

Below is the TCP dump for working pod

Below is the TCP dump for non working pod

Sorry - we’ve reached the limits of my ability to help in this area.

Are you able to open a support case so that someone can help diagnose why TLS1.0 isn’t working as it should?

@MaBeuLux88 could you help us here, we are currently clueless
In addition to above issue mentioned we see in logs

mongoose connect(…) failed with err: MongoNetworkError: failed to connect to server [hostname:27017] on first connect [Error: read ETIMEDOUT
at TLSWrap.onStreamRead (internal/stream_base_commons.js:209:20)
at TLSWrap.callbackTrampoline (internal/async_hooks.js:130:17) {
name: ‘MongoNetworkError’,
[Symbol(mongoErrorContextSymbol)]: {}
}]
at Pool. (/home/node/app/node_modules/mongodb/lib/core/topologies/server.js:433:11)
at Pool.emit (events.js:400:28)
at /home/node/app/node_modules/mongodb/lib/core/connection/pool.js:577:14
at /home/node/app/node_modules/mongodb/lib/core/connection/pool.js:1021:9

Clearly says its a TLS issue , can some one help us here

Hey all,

I read the thread in diagonal but looks like you two have been pretty far in the debugging already. I’d try to use the latest versions (both for Mongoose and MDB) to see if that solves the problem. I’d also try to update & upgrade the OS (linux?) running the pods and make sure they are using the latest libssl package or equivalent.

Else open a ticket because it could also be a weird bug that needs proper investigation from the Atlas team.

Cheers,
Maxime.

Hey @MaBeuLux88

In addition to above issue we noticed below error , noting hostname of one of the primary is getting printed with error MongoNetworkError. Could you please look into these logs

[2022-08-05T15:37:59.091Z] [INFO]: [] [platform/activation] [ServerName: ns] [PodName: 7cbb8b8c69-rp9wn] [session-id: ] [correlation-id: ] MongoDB event error.: MongoNetworkError: failed to connect to server [secondaryHostname:27017] on first connect [MongoNetworkError: connection 5 to <<secondaryHostname>>:27017 timed out
    at TLSSocket.<anonymous> (/home/node/app/node_modules/@platform/common-app/node_modules/mongodb/lib/core/connection/connection.js:355:7)
    at Object.onceWrapper (events.js:519:28)
    at TLSSocket.emit (events.js:400:28)
    at TLSSocket.Socket._onTimeout (net.js:495:8)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7) {
  [Symbol(mongoErrorContextSymbol)]: {}
}]


[2022-08-05T15:37:59.208Z] [INFO]: [] [platform/activation] [ServerName: ns] [PodName: -7cbb8b8c69-rp9wn] [session-id: ] [correlation-id: ] MongoDB event error.: MongoNetworkError: failed to connect to server [<<primaryHostname:27017] on first connect [MongoNetworkError: connection 5 to primaryHostname:27017 timed out
    at TLSSocket.<anonymous> (/home/node/app/node_modules/@-platform/-common-app/node_modules/mongodb/lib/core/connection/connection.js:355:7)
    at Object.onceWrapper (events.js:519:28)
    at TLSSocket.emit (events.js:400:28)
    at TLSSocket.Socket._onTimeout (net.js:495:8)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7) {
  [Symbol(mongoErrorContextSymbol)]: {}
}]

We have also raised case, its been 48 hrs no help from the support team

Is there any fix out for this issue??