Unable to Add Node to Replica Set Cluster

hi,
I am setting up mongodb replica set cluster with 3 nodes. I have mongo v 4.4.17 installed on all three nodes. The cluster is initialized with no error but while adding secondary node using rs.add(“SECONDNODE:27017”) it shows following error:

{
        "operationTime" : Timestamp(1667896491, 1),
        "ok" : 0,
        "errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: FIRSTNODE:27017; the following nodes did not respond affirmatively: SECONDNODE:27017 failed with Error connecting to SECONDNODE:27017 :: caused by :: Could not find address for SECONDNODE:27017: SocketException: Host not found (authoritative)",
        "code" : 74,
        "codeName" : "NodeNotFound",
        "$clusterTime" : {
                "clusterTime" : Timestamp(1667896491, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

I can telnet and ping both servers form each server. On adding second node with ip address, the second node doesn’t acknowlede replication.

Share your /etc/hosts and rs conf() output
Sometimes quotes around host:port also cause issues
Use straight double quotes

the config file is

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Where and how to store data.
storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true
#  engine:
#  wiredTiger:

# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile
  timeZoneInfo: /usr/share/zoneinfo

# network interfaces
net:
  port: 27017
  bindIp: 0.0.0.0# Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting.

#security:

#operationProfiling:

replication:
  replSetName: "TESTREPLICATION"

#sharding:

## Enterprise-Only Options

#auditLog:

#snmp:

I wanted rs.conf() output from mongo primary
From the node where you ran rs.initiate() would have become primary and where you are trying to add other nodes
Did you try to add ,3rd node?
What about quotes issue?Is that ruled out
And also /etc/hosts

Following is the result of rs.conf(). The first node is primary but when I want to add second node it pops the error. Currently I am trying to only 2 nodes but if the second node succeds then I will add another standalone third node.

TESTREPLICATION:PRIMARY> rs.conf()
{
        "_id" : "TESTREPLICATION",
        "version" : 1,
        "term" : 1,
        "protocolVersion" : NumberLong(1),
        "writeConcernMajorityJournalDefault" : true,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "FIRSTNODE:27017",
                        "arbiterOnly" : false,
                        "buildIndexes" : true,
                        "hidden" : false,
                        "priority" : 1,
                        "tags" : {

                        },
                        "slaveDelay" : NumberLong(0),
                        "votes" : 1
                }
        ],
        "settings" : {
                "chainingAllowed" : true,
                "heartbeatIntervalMillis" : 2000,
                "heartbeatTimeoutSecs" : 10,
                "electionTimeoutMillis" : 10000,
                "catchUpTimeoutMillis" : -1,
                "catchUpTakeoverDelayMillis" : 30000,
                "getLastErrorModes" : {

                },
                "getLastErrorDefaults" : {
                        "w" : 1,
                        "wtimeout" : 0
                },
                "replicaSetId" : ObjectId("636bb9d25ee322803e1ddfff")
        }
}

I am able to telnet and ping secondnode

TESTCLUSTER:PRIMARY> rs.add("SECONDNODE:27017")
{
        "operationTime" : Timestamp(1668004686, 1),
        "ok" : 0,
        "errmsg" : "Quorum check failed because not enough voting nodes responded; required 2 but only the following 1 voting nodes responded: FIRSTNODE:27017; the following nodes did not respond affirmatively: SECONDNODE:27017 failed with Error connecting to SECONDNODE:27017 :: caused by :: Could not find address for SECONDNODE:27017: SocketException: Host not found (authoritative)",
        "code" : 74,
        "codeName" : "NodeNotFound",
        "$clusterTime" : {
                "clusterTime" : Timestamp(1668004686, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}

Is mongod up & running on node2 & 3?
Does mongod.log show more details
Could be DNS firewall issues
Can you connect to each of your nodes &
Ping each other?
Did you try to add 3rd node?

Yes, Mongod is running on all nodes and can ping and connect to them all.
I have not initialized third node because it is currently using by developers for testing so as soon as the replication succeds i will add the third node to the cluster. I have stopped firewall in all nodes so i think it is not firewall issue.
the log file shows:

{"t":{"$date":"2022-11-09T23:35:50.237+08:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"SECONDNODE:27017"}}

I started the mongo with the configuration i.et mongod -f /(path…to…conf) and initialize the replication on primary server. The replication is initialized but when I add second node the scree freezes and the log file shows in a loop and when I stop the service it shows:

"_id" : 1,
                        "name" : "SEONDNODE:27017",
                        "health" : 1,
                        "state" : 0,
                        "stateStr" : "STARTUP",
                        "uptime" : 1887,
                        "optime" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDurable" : {
                                "ts" : Timestamp(0, 0),
                                "t" : NumberLong(-1)
                        },
                        "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
                        "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
                        "lastAppliedWallTime" : ISODate("1970-01-01T00:00:00Z"),
                        "lastDurableWallTime" : ISODate("1970-01-01T00:00:00Z"),
                        "lastHeartbeat" : ISODate("2022-11-10T05:55:51.305Z"),
                        "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
                        "pingMs" : NumberLong(0),
                        "lastHeartbeatMessage" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "configVersion" : -2,
                        "configTerm" : -1

Message from the log file is:

"t":{"$date":"2022-11-10T13:52:19.031+08:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"192.168.56.102:59644","connectionId":3349,"connectionCount":4}}

Startup is not correct status.It should change to secondary
Startup status means it is not part of any replicaset
Are your config file replset param matching with primary?
Check mongod.log of secondary
Did you run any other command on secondary?
What exactly you mean by not initialised 3rd node?
You should run rs.initiate() only once on primary not on all nodes

Yes, the config file param matches with primary and I ran rs.initiate() command on primary only. The secondary does not acknowledge the replication but I can see the connection is established by the primary to the second node. I haven’t run any other command on secondary. I just run mogod instance with the configuration using mongod -f /path_to_conf. When I repeat the same process in my local environment there is no any errors and the replicaiton works just fine. But when I try the same steps in preprod environment I am stucked.

Appears to be hostname to IP resolution issue
Compare your working environment to preprod
Is your /etc/hosts setup properly
Can you connect from one node to other using --host?

1 Like

Thank you !
my issue is now resolved. It was a simple spelling mistake in hostname.