We faced an issue in our prod env, where one Primary VM was accessible, but It was not accepting any connections. All application that tried to connect to mongo were failing. Attempt to Mongo login to the primary member was not successful. Manual try to start up MongoDB in the VM was also unsuccessful.
Since, Mongo did not went down completely, No election happened. The problemed VM was showing as “primary” according to rs.status. We had to restart the server and then the issue got resolved. We need to find RCA on this.
We are using Mongo 4.4.7 community version.
And we are having below configuration:
config replicaset - 1 Primary, 2 Secondary
shard1 - 1 Primary, 2 Secondary
shard2 - 1 Primary, 2 Secondary
shard3 - 1 Primary, 2 Secondary
2 query router.
errorMessage":"NetworkInterfaceExceededTimeLimit: Couldn’t get a connection within the time limit .
We checked our query router available connection:
QR1
“current” : 7654,
“available” : 43546,
“totalCreated” : 134309236,
“active” : 2890,
“exhaustIsMaster” : 487,
“exhaustHello” : 229,
“awaitingTopologyChanges” : 716
QR2
“current” : 7746,
“available” : 43454,
“totalCreated” : 134299931,
“active” : 2997,
“exhaustIsMaster” : 487,
“exhaustHello” : 229,
“awaitingTopologyChanges” : 716
Also we checked logs thoroughly , connection was getting accepted till 07:09 UTC. The error “NetworkInterfaceExceededTimeLimit” was not present at 07:11 UTC.
But the error suddenly started exact at 2023-03-29T07:12 utc.