Mongo On-Prem Replica set - Binding issue

Our current P-S-S MongoDB replica set running on Linux. Replication has stopped working among the 3 servers and we are now seeing an issue with port binding on start-up with the mongod service.
Mongo is running, on netstat i’m not seeing binding 27017 port listening, causing replica unable to find nodes, Tried to run with mongo with config location, restart, renaming lock and starting service. Nothing helped.

We guessing it might be doing prep synchronize before joining to network not sure, this happened sudden over weekend and 2 nodes were unable to join replica because of that.

Any suggestions, thoughts? would be much appreciated.

Hi pruthvi_reddy,

It would be helpful if you could show any error messages in the logs or when starting up. Without any error messages or anything to go off of we can’t provide any substantial help.

Hi Sure:

Here is various errors/times we received, up-on various trouble shooting what we observed is 27017 socket is not open and thats where we stuck.

Error: couldn’t connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused.

mongod.service: Failed with result ‘exit-code’

Error: couldn’t connect to server 127.0.0.1:27017 src/mongo/shell/mongo.js:91
exception: connect failed

Thank you, can you also post your config file of the mongod you are trying to start.

also if you do the following unix command does it show any mongod running?

ps -ax | grep mongo

HI Sure, here is the results for the server mongo:

image

Here is mongo config:

what do you mean by “stopped”? did you detach members from the replica set? else how many of members have this problem?

also, where are these members hosted? virtual machines on single host pc? or all on their designated bare-metal hosts?

most importantly, when have this started? was it from the beginning as you are still configuration/developing phases, or it was working fine and started recently?

this error is due to either incorrect firewall settings, or simply because there is no server listening thus OS refuses the request.

We didn’t detach server from replica set, looks like we had a reboot happen on the 2 servers (P and S).

After the servers cam online 2 of them were sitting in refused state where on startup we have mongo.service that starts the mongo with config we have. Some reasons on network we cant see 27017 port is not showing up. so that caused replica to sync.

Servers were hosted on AWS Linux.

This was on a running server, these were running 2.5 years minimum and no issues, this is sudden from Jan 28th.

Connection refused: This is because mongo service is running but replica/mongo can’t able to reach the network because on start-up 27017 port is not opened by OS. We tried to reboot and did so much for some reasons OS is not starting / opening the port.

can you try to check your AWS settings (and logs) of hosts for those members? maybe there was a system update (hence a reboot) that broke some settings. or maybe someone tried to update mongodb but did not follow clean steps.

the “log file” set in the config should be showing what errors are encountered when mongod tries to start. if it gets an error, it will just exit to keep data safe meaning there won’t be a service listening on port 27017. this also makes it easy to search the log file because the error won’t be far away from the last line logged.

Thank you and That make sense, but i keep seeing oplog command over and over. current log file size is actually 3GB due to that over and over logging {verbose:0}.

Seems like 2 servers were out from saturday, from what we seeing on logs, our servers had an outage and we didn’t had monitor for mongo replica since other members can handle the downtime but in this case more than one was down and caused replica to cant caughtup and in a state where exceeded the log timelimit, so that must be why those servers were not able to open port because of trying to recover.

Can we delete the date from one of the servers and add them into replica? does replica can handle sync server? we almost 1M objects.

1M does not seem big but before tinkering with the data, let’s try cleaning log files first to check if it heals.

if you think the current log file might be needed for older events, stop the server first, move it to a safe location, then restart the server. if not, just edit the config file so it reads "logRotate: rename" and try restarting the server.

Got you did same and still seeing same issue, service is started but mongo shutdown itself.

Hi @pruthvi_reddy

Sorry you’re facing this issue. In many cases, a 3-node replica set would be able to tolerate 1 node down, but having 2 nodes down would put the remaining node in a read-only node, so at least you know that your data is accessible. You just can’t add new data.

Having said that, we’ve been getting the information in piecemeal fashion so far. Could you post the relevant logs from all 3 nodes? When a node shuts down, we need to see what’s been written in the log. Please provide all the information that you think can help.

The output of rs.status() and rs.conf() from the remaining node would be helpful to the picture. Also please post your MongoDB version and your OS version.

Best regards
Kevin