-
We faced an issue in our prod env, where one Primary VM stopped accepting any connections, and all prod API s started failing during the period.
-
Attempt to manual Mongo login to the primary member was not successful. Manual try to start up MongoDB in the VM was also unsuccessful.
-
Since , The problemed VM was still showing as “primary” according to rs.status(), No election happened among the available secondaries.
- Has anyone faced this issue?
- It would be a great help, if you kindly suggest what could be reason behind such behavior
We checked mongos logs, connection was getting accepted and no error was not present till 07:11 UTC. But the error suddenly started exact at 07:12 utc.
{"t":{"$date":"2023-03-29T07:11:58.639+00:00"},"s":"I", "c":"CONNPOOL", "id":22576, "ctx":"establishCursors cleanup","msg":"Connecting","attr":{"hostAndPort":"hostname:port"}}
{"t":{"$date":"2023-03-29T07:12:51.493+00:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn29626580","msg":"Host failed in replica set","attr":{"replicaSet":"rs1","host":"hostname:port","error":{"code":202,"codeName":"NetworkInterfaceExceededTimeLimit","errmsg":"Couldn't get a connection within the time limit of 8ms"},"action":{"dropConnections":false,"requestImmediateCheck":false,"outcome":{"host":"hostname:port","success":false,"errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn't get a connection within the time limit of 8ms"}}}}
Additional details:
We are using Mongo 4.4.7 community version. And we are having below configuration:
config replicaset - 1 Primary, 2 Secondary
shard1 - 1 Primary, 2 Secondary
shard2 - 1 Primary, 2 Secondary
shard3 - 1 Primary, 2 Secondary
2 query router.
We checked our query router available connection, during the issue period.
QR1
“current” : 7654,
“available” : 43546,
“totalCreated” : 134309236,
“active” : 2890,
“exhaustIsMaster” : 487,
“exhaustHello” : 229,
“awaitingTopologyChanges” : 716
QR2
“current” : 7746,
“available” : 43454,
“totalCreated” : 134299931,
“active” : 2997,
“exhaustIsMaster” : 487,
“exhaustHello” : 229,
“awaitingTopologyChanges” : 716