MongoDB 7 Replication Setup

Hi all,

I’m trying to setup a replica set with MongoDB 7 following the instruction on mongodb.com. Everything was going well, the replica set started up correctly with 1 primary and 1 secondary on the same machine with different ports and data directory.
However, after 10min, I started to see the following logs produced.
The cluster currently does not doing anything yet, neither client connection nor data operation. Everything is default.

{"t":{"$date":"2024-03-17T22:41:10.571+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC624CE6","b":"55C4A484B000","o":"7DD9CE6","s":"_ZN5mongo13error_details16ExceptionForImplILNS_10ErrorCodes5ErrorE262EJNS_15ExceptionForCatILNS_13ErrorCategoryE2EEENS4_ILS5_10EEENS4_ILS5_14EEEEEC1ERKNS_6StatusE","C":"mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)262, mongo::ExceptionForCat<(mongo::ErrorCategory)2>, mongo::ExceptionForCat<(mongo::ErrorCategory)10>, mongo::ExceptionForCat<(mongo::ErrorCategory)14> >::ExceptionForImpl(mongo::Status const&)","s+":"16"}}}
{"t":{"$date":"2024-03-17T22:41:10.571+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC60AF8C","b":"55C4A484B000","o":"7DBFF8C","s":"_ZN5mongo13error_details23throwExceptionForStatusERKNS_6StatusE","C":"mongo::error_details::throwExceptionForStatus(mongo::Status const&)","s+":"50EA"}}}
{"t":{"$date":"2024-03-17T22:41:20.575+00:00"},"s":"W",  "c":"ASSERT",   "id":23075,   "ctx":"conn9","msg":"DBException thrown","attr":{"error":{"code":262,"codeName":"ExceededTimeLimit","errmsg":"operation exceeded time limit"}}}
{"t":{"$date":"2024-03-17T22:44:03.772+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC6AFB78","b":"55C4A484B000","o":"7E64B78","s":"_ZN5mongo15printStackTraceEv","C":"mongo::printStackTrace()","s+":"38"}}}
{"t":{"$date":"2024-03-17T22:44:03.772+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4A94EBF04","b":"55C4A484B000","o":"4CA0F04","s":"_ZN5mongo11DBExceptionC1ERKNS_6StatusE","C":"mongo::DBException::DBException(mongo::Status const&)","s+":"34"}}}
{"t":{"$date":"2024-03-17T22:44:03.772+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC624CE6","b":"55C4A484B000","o":"7DD9CE6","s":"_ZN5mongo13error_details16ExceptionForImplILNS_10ErrorCodes5ErrorE262EJNS_15ExceptionForCatILNS_13ErrorCategoryE2EEENS4_ILS5_10EEENS4_ILS5_14EEEEEC1ERKNS_6StatusE","C":"mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)262, mongo::ExceptionForCat<(mongo::ErrorCategory)2>, mongo::ExceptionForCat<(mongo::ErrorCategory)10>, mongo::ExceptionForCat<(mongo::ErrorCategory)14> >::ExceptionForImpl(mongo::Status const&)","s+":"16"}}}
{"t":{"$date":"2024-03-17T22:44:03.772+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC60AF8C","b":"55C4A484B000","o":"7DBFF8C","s":"_ZN5mongo13error_details23throwExceptionForStatusERKNS_6StatusE","C":"mongo::error_details::throwExceptionForStatus(mongo::Status const&)","s+":"50EA"}}}
{"t":{"$date":"2024-03-17T22:44:03.772+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn9","msg":"Frame","attr":{"frame":{"a":"55C4AC69CD44","b":"55C4A484B000","o":"7E51D44","s":"_ZN5mongo13iassertFailedERKNS_6StatusENS_14SourceLocationE","C":"mongo::iassertFailed(mongo::Status const&, mongo::SourceLocation)","s+":"19A"}}}

Config file as following

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# Where and how to store data.
storage:
  dbPath: /var/lib/mongodb
  directoryPerDB: true
  wiredTiger:
    engineConfig:
      directoryForIndexes: true

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  logRotate: reopen
  traceAllExceptions: true
  path: /var/log/mongodb/mongod.log

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1,172.16.1.210


# how the process runs
processManagement:
  timeZoneInfo: /usr/share/zoneinfo

security:
  #authorization: enabled
  keyFile: /etc/mongodb-keyfile

#operationProfiling:

replication:
  replSetName: test-dev

Here is the debug log for the above issue. Seems like the server is waiting for hello response but not receiving any

"NETWORK",  "id":22934,   "ctx":"conn4","msg":"Starting server-side compression negotiation"}
{"t":{"$date":"2024-03-19T20:33:10.807+00:00"},"s":"D3", "c":"FTDC",     "id":23904,   "ctx":"conn4","msg":"Using maxAwaitTimeMS for awaitable hello protocol","attr":{"maxAwaitTimeMS":10000}}
{"t":{"$date":"2024-03-19T20:33:10.810+00:00"},"s":"D1", "c":"REPL",     "id":21342,   "ctx":"conn4","msg":"Waiting for a hello response from a topology change or until deadline","attr":{"deadline":{"$date":"2024-03-19T20:33:20.807Z"},"currentTopologyVersionCounter":4}}
{"t":{"$date":"2024-03-19T20:33:20.980+00:00"},"s":"D3", "c":"ASSERT",   "id":4892201, "ctx":"conn4","msg":"Internal assertion","attr":{"error":{"code":262,"codeName":"ExceededTimeLimit","errmsg":"operation exceeded time limit"},"location":"{fileName:\"src/mongo/util/interruptible.h\", line:290, functionName:\"operator()\"}"}}
{"t":{"$date":"2024-03-19T20:33:20.980+00:00"},"s":"W",  "c":"ASSERT",   "id":23075,   "ctx":"conn4","msg":"DBException thrown","attr":{"error":{"code":262,"codeName":"ExceededTimeLimit","errmsg":"operation exceeded time limit"}}}


{"t":{"$date":"2024-03-19T20:33:21.147+00:00"},"s":"D1", "c":"REPL",     "id":6208204, "ctx":"conn4","msg":"Error while waiting for hello response","attr":{"status":{"code":262,"codeName":"ExceededTimeLimit","errmsg":"operation exceeded time limit"}}}
{"t":{"$date":"2024-03-19T20:33:21.147+00:00"},"s":"D3", "c":"FTDC",     "id":23905,   "ctx":"conn4","msg":"Using exhaust for isMaster or hello protocol"}

Any thought on this issue?

How do you start the instances?

You should be running 3 instances, not 2.

You mentioned different ports and data directory but you shared only 1 configuration file.

The link you share does not seems to point to a specific set of instructions. Can you provide a link that points directly to the instructions you followed?

Please provide the rs.status().

Hi, thanks for your time. I started 2 instances by cloning the config and systemd files.
mongod.service uses the mongod.conf to start and mongod2.service uses mongod2.conf to start as you can see in the service files below.

I’ve setup another VM to test with the same setup and there is no such error. The replica set can run with 2 instances without any issue. The only problem is that if 1 instance down, the whole set will be down

The link I follow: https://www.mongodb.com/docs/manual/tutorial/deploy-replica-set/

mongod.service as following

After=network-online.target
Wants=network-online.target

[Service]
User=mongodb
Group=mongodb
EnvironmentFile=-/etc/default/mongod
Environment="MONGODB_CONFIG_OVERRIDE_NOFORK=1"
ExecStart=/usr/bin/mongod --config /etc/mongod.conf
RuntimeDirectory=mongodb
# file size
LimitFSIZE=infinity
# cpu time
LimitCPU=infinity
# virtual memory size
LimitAS=infinity
# open files
LimitNOFILE=64000
# processes/threads
LimitNPROC=64000
# locked memory
LimitMEMLOCK=infinity
# total threads (user+kernel)
TasksMax=infinity
TasksAccounting=false

# Recommended limits for mongod as specified in
# https://docs.mongodb.com/manual/reference/ulimit/#recommended-ulimit-settings

[Install]
WantedBy=multi-user.target

mongod2.service as following

[Unit]
Description=MongoDB Database Server
Documentation=https://docs.mongodb.org/manual
After=network-online.target
Wants=network-online.target

[Service]
User=mongodb
Group=mongodb
EnvironmentFile=-/etc/default/mongod
Environment="MONGODB_CONFIG_OVERRIDE_NOFORK=1"
ExecStart=/usr/bin/mongod --config /etc/mongod2.conf
RuntimeDirectory=mongodb
# file size
LimitFSIZE=infinity
# cpu time
LimitCPU=infinity
# virtual memory size
LimitAS=infinity
# open files
LimitNOFILE=64000
# processes/threads
LimitNPROC=64000
# locked memory
LimitMEMLOCK=infinity
# total threads (user+kernel)
TasksMax=infinity
TasksAccounting=false
#Cuong add to auto restart
Restart=always

# Recommended limits for mongod as specified in
# https://docs.mongodb.com/manual/reference/ulimit/#recommended-ulimit-settings

[Install]
WantedBy=multi-user.target

rs.status() output as following

{
  set: 'firefly-dev',
  date: ISODate('2024-03-20T21:28:19.579Z'),
  myState: 1,
  term: Long('46'),
  syncSourceHost: '',
  syncSourceId: -1,
  heartbeatIntervalMillis: Long('2000'),
  majorityVoteCount: 2,
  writeMajorityCount: 2,
  votingMembersCount: 2,
  writableVotingMembersCount: 2,
  optimes: {
    lastCommittedOpTime: { ts: Timestamp({ t: 1710970097, i: 10 }), t: Long('46') },
    lastCommittedWallTime: ISODate('2024-03-20T21:28:17.717Z'),
    readConcernMajorityOpTime: { ts: Timestamp({ t: 1710970097, i: 10 }), t: Long('46') },
    appliedOpTime: { ts: Timestamp({ t: 1710970097, i: 10 }), t: Long('46') },
    durableOpTime: { ts: Timestamp({ t: 1710970097, i: 10 }), t: Long('46') },
    lastAppliedWallTime: ISODate('2024-03-20T21:28:17.717Z'),
    lastDurableWallTime: ISODate('2024-03-20T21:28:17.717Z')
  },
  lastStableRecoveryTimestamp: Timestamp({ t: 1710970057, i: 30 }),
  electionCandidateMetrics: {
    lastElectionReason: 'electionTimeout',
    lastElectionDate: ISODate('2024-03-20T02:05:39.256Z'),
    electionTerm: Long('46'),
    lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 0, i: 0 }), t: Long('-1') },
    lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1710885131, i: 1 }), t: Long('45') },
    numVotesNeeded: 2,
    priorityAtElection: 1,
    electionTimeoutMillis: Long('10000'),
    numCatchUpOps: Long('0'),
    newTermStartDate: ISODate('2024-03-20T02:05:39.272Z'),
    wMajorityWriteAvailabilityDate: ISODate('2024-03-20T02:05:39.299Z')
  },
  members: [
    {
      _id: 0,
      name: '172.16.1.210:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      uptime: 69773,
      optime: { ts: Timestamp({ t: 1710970097, i: 10 }), t: Long('46') },
      optimeDate: ISODate('2024-03-20T21:28:17.000Z'),
      lastAppliedWallTime: ISODate('2024-03-20T21:28:17.717Z'),
      lastDurableWallTime: ISODate('2024-03-20T21:28:17.717Z'),
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      electionTime: Timestamp({ t: 1710900339, i: 1 }),
      electionDate: ISODate('2024-03-20T02:05:39.000Z'),
      configVersion: 12,
      configTerm: 46,
      self: true,
      lastHeartbeatMessage: ''
    },
    {
      _id: 1,
      name: '172.16.1.210:27018',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 69761,
      optime: { ts: Timestamp({ t: 1710970092, i: 10 }), t: Long('46') },
      optimeDurable: { ts: Timestamp({ t: 1710970092, i: 10 }), t: Long('46') },
      optimeDate: ISODate('2024-03-20T21:28:12.000Z'),
      optimeDurableDate: ISODate('2024-03-20T21:28:12.000Z'),
      lastAppliedWallTime: ISODate('2024-03-20T21:28:17.717Z'),
      lastDurableWallTime: ISODate('2024-03-20T21:28:17.717Z'),
      lastHeartbeat: ISODate('2024-03-20T21:28:17.615Z'),
      lastHeartbeatRecv: ISODate('2024-03-20T21:28:19.504Z'),
      pingMs: Long('0'),
      lastHeartbeatMessage: '',
      syncSourceHost: '172.16.1.210:27017',
      syncSourceId: 0,
      infoMessage: '',
      configVersion: 12,
      configTerm: 46
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1710970097, i: 10 }),
    signature: {
      hash: Binary.createFromBase64('OFp4tPbtU86R1dSZS6h+vJvCgmA=', 0),
      keyId: Long('7329995365803884549')
    }
  },
  operationTime: Timestamp({ t: 1710970097, i: 10 })
}

That is what you get when you are running only 2 instances rather than the recommended 3. As mentioned in the 3rd sentence of the 1st paragraph, Replica sets should always have an odd number of members.

With an RS of 2 nodes, you cannot have majority when 1 node is down.

Hi Steeve,

Yes you are right about the majority for master selection but it shouldn’t have the issue of not receiving any response for hello messages.
I’ve have setup the third instance but it’s still having the same issue. For another Repl Set with 3 instances, currently there is no such issue at all even when I purposely shutdown 1 of the secondaries or even the master for testing.

Please share the output of

ss -tlnp
ps -aef | grep [m]ongod

Also the output of getLog with startupWarnings (see https://www.mongodb.com/docs/manual/reference/command/getLog/#command-fields).

Could your firewall be blocking the traffic?

hi,

There is no firewall installed on the VM. also, communication between instances are normal, except for the hello message

Output of ss -tlnp

root@dev-infa:/home/ubuntu# ss -tlnp
State              Recv-Q             Send-Q                                 Local Address:Port                           Peer Address:Port             Process                                                                                                                                                 
LISTEN             0                  4096                                         0.0.0.0:4369                                0.0.0.0:*                 users:(("epmd",pid=1161,fd=3))                                                                                                                         
LISTEN             0                  511                                     172.16.1.210:5601                                0.0.0.0:*                 users:(("node",pid=625,fd=18))                                                                                                                         
LISTEN             0                  1024                                         0.0.0.0:15672                               0.0.0.0:*                 users:(("beam.smp",pid=632,fd=49))                                                                                                                     
LISTEN             0                  511                                          0.0.0.0:6379                                0.0.0.0:*                 users:(("redis-server",pid=633,fd=6))                                                                                                                  
LISTEN             0                  128                                          0.0.0.0:25672                               0.0.0.0:*                 users:(("beam.smp",pid=632,fd=19))                                                                                                                     
LISTEN             0                  4096                                         0.0.0.0:111                                 0.0.0.0:*                 users:(("rpcbind",pid=530,fd=4),("systemd",pid=1,fd=165))                                                                                              
LISTEN             0                  511                                          0.0.0.0:80                                  0.0.0.0:*                 users:(("nginx",pid=154539,fd=23),("nginx",pid=154538,fd=23),("nginx",pid=154537,fd=23),("nginx",pid=154536,fd=23),("nginx",pid=832,fd=23))            
LISTEN             0                  128                                          0.0.0.0:22                                  0.0.0.0:*                 users:(("sshd",pid=694,fd=3))                                                                                                                          
LISTEN             0                  511                                          0.0.0.0:443                                 0.0.0.0:*                 users:(("nginx",pid=154539,fd=25),("nginx",pid=154538,fd=25),("nginx",pid=154537,fd=25),("nginx",pid=154536,fd=25),("nginx",pid=832,fd=25))            
LISTEN             0                  4096                                   127.0.0.53%lo:53                                  0.0.0.0:*                 users:(("systemd-resolve",pid=579,fd=14))                                                                                                              
LISTEN             0                  4096                                         0.0.0.0:27017                               0.0.0.0:*                 users:(("mongod",pid=193921,fd=14))                                                                                                                    
LISTEN             0                  4096                                         0.0.0.0:27018                               0.0.0.0:*                 users:(("mongod",pid=194008,fd=14))                                                                                                                    
LISTEN             0                  128                                                *:5672                                      *:*                 users:(("beam.smp",pid=632,fd=50))                                                                                                                     
LISTEN             0                  4096                                               *:5044                                      *:*                 users:(("java",pid=647,fd=122))                                                                                                                        
LISTEN             0                  4096                                               *:5045                                      *:*                 users:(("java",pid=647,fd=121))                                                                                                                        
LISTEN             0                  4096                                            [::]:4369                                   [::]:*                 users:(("epmd",pid=1161,fd=4))                                                                                                                         
LISTEN             0                  511                                             [::]:6379                                   [::]:*                 users:(("redis-server",pid=633,fd=7))                                                                                                                  
LISTEN             0                  4096                                               *:9200                                      *:*                 users:(("java",pid=1370,fd=449))                                                                                                                       
LISTEN             0                  4096                                            [::]:111                                    [::]:*                 users:(("rpcbind",pid=530,fd=6),("systemd",pid=1,fd=167))                                                                                              
LISTEN             0                  511                                             [::]:80                                     [::]:*                 users:(("nginx",pid=154539,fd=24),("nginx",pid=154538,fd=24),("nginx",pid=154537,fd=24),("nginx",pid=154536,fd=24),("nginx",pid=832,fd=24))            
LISTEN             0                  128                                             [::]:22                                     [::]:*                 users:(("sshd",pid=694,fd=4))                                                                                                                          
LISTEN             0                  50                             [::ffff:172.16.1.210]:9600                                      *:*                 users:(("java",pid=647,fd=74))                                                                                                                         
LISTEN             0                  4096                           [::ffff:172.16.1.210]:9300                                      *:*                 users:(("java",pid=1370,fd=444))                                                                                                     

Output of ps -aef | grep [m]ongod

root@dev-infa:/home/ubuntu# ps -aef | grep [m]ongod
mongodb   193921       1  9 19:40 ?        00:00:11 /usr/bin/mongod --config /etc/mongod.conf
mongodb   194008       1 10 19:40 ?        00:00:11 /usr/bin/mongod --config /etc/mongod2.conf

startupWarnings showing the following as this dev server and not using XFS

{
  totalLinesWritten: 1,
  log: [
    '{"t":{"$date":"2024-03-21T19:40:10.218+00:00"},"s":"I",  "c":"STORAGE",  "id":22297,   "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}\n'
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1711050224, i: 1 }),
    signature: {
      hash: Binary.createFromBase64('Kd9y/AxILMoRykrfJ6ag8I7H+XY=', 0),
      keyId: Long('7329995365803884549')
    }
  },
  operationTime: Timestamp({ t: 1711050224, i: 1 })
}