Zombie nodes are not being reaped from replica set topology

taylorswift · January 23, 2024, 2:35am

i am currently running a two-node replica set with a systemd configuration resembling the following:

[Service]
User=mongod
Group=mongod
Environment="OPTIONS=-f /etc/mongod.conf"
Environment="MONGODB_CONFIG_OVERRIDE_NOFORK=1"
Environment="LD_PRELOAD=/usr/lib64/libjemalloc.so.2"

...

MemoryHigh=501M
MemoryMax=499M
Restart=always

for our external users, this replica set supports queries with secondaryPreferred read preference. both nodes occasionally collapse due to OOM, and sometimes the replica set will detect this, mark the crashed node as '(not reachable/healthy)', and reroute queries to the surviving node.

however, recently we have begun noticing instances in which one of the nodes will slow down dramatically (most likely due to memory pressure), and the replica set will continue to believe the throttled node is healthy and refer queries to it. for example, earlier today the SECONDARY node was clearly hung (all user queries were timing out), yet it still remained part of the replica set when we inspected the two nodes from mongosh.

from the perspective of the SECONDARY:

  members: [
    {
      _id: 0,
      name: 'venus.swiftinit.org:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 87173,
      optime: { ts: Timestamp({ t: 1705975729, i: 1 }), t: Long('82') },
      optimeDate: ISODate('2024-01-23T02:08:49.000Z'),
      lastAppliedWallTime: ISODate('2024-01-23T02:08:49.318Z'),
      lastDurableWallTime: ISODate('2024-01-23T02:08:49.318Z'),
      syncSourceHost: 'juno.swiftinit.org:27017',
      syncSourceId: 1,
      infoMessage: '',
      configVersion: 101786,
      configTerm: 82,
      self: true,
      lastHeartbeatMessage: ''
    },
    {
      _id: 1,
      name: 'juno.swiftinit.org:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      uptime: 249,
      optime: { ts: Timestamp({ t: 1705975729, i: 2 }), t: Long('82') },
      optimeDurable: { ts: Timestamp({ t: 1705975729, i: 2 }), t: Long('82') },
      optimeDate: ISODate('2024-01-23T02:08:49.000Z'),
      optimeDurableDate: ISODate('2024-01-23T02:08:49.000Z'),
      lastAppliedWallTime: ISODate('2024-01-23T02:08:49.974Z'),
      lastDurableWallTime: ISODate('2024-01-23T02:08:49.974Z'),
      lastHeartbeat: ISODate('2024-01-23T02:10:58.732Z'),
      lastHeartbeatRecv: ISODate('2024-01-23T02:10:56.245Z'),
      pingMs: Long('249'),
      lastHeartbeatMessage: '',
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      electionTime: Timestamp({ t: 1705975643, i: 2 }),
      electionDate: ISODate('2024-01-23T02:07:23.000Z'),
      configVersion: 101786,
      configTerm: 82
    }
  ],

from the perspective of the PRIMARY:

  members: [
    {
      _id: 0,
      name: 'venus.swiftinit.org:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 347,
      optime: { ts: Timestamp({ t: 1705975787, i: 653 }), t: Long('82') },
      optimeDurable: { ts: Timestamp({ t: 1705975787, i: 653 }), t: Long('82') },
      optimeDate: ISODate('2024-01-23T02:09:47.000Z'),
      optimeDurableDate: ISODate('2024-01-23T02:09:47.000Z'),
      lastAppliedWallTime: ISODate('2024-01-23T02:09:47.190Z'),
      lastDurableWallTime: ISODate('2024-01-23T02:09:47.190Z'),
      lastHeartbeat: ISODate('2024-01-23T02:10:45.922Z'),
      lastHeartbeatRecv: ISODate('2024-01-23T02:10:45.335Z'),
      pingMs: Long('1154'),
      lastHeartbeatMessage: '',
      syncSourceHost: 'juno.swiftinit.org:27017',
      syncSourceId: 1,
      infoMessage: '',
      configVersion: 101786,
      configTerm: 82
    },
    {
      _id: 1,
      name: 'juno.swiftinit.org:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      uptime: 350,
      optime: { ts: Timestamp({ t: 1705975839, i: 1 }), t: Long('82') },
      optimeDate: ISODate('2024-01-23T02:10:39.000Z'),
      lastAppliedWallTime: ISODate('2024-01-23T02:10:39.320Z'),
      lastDurableWallTime: ISODate('2024-01-23T02:10:39.320Z'),
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      electionTime: Timestamp({ t: 1705975643, i: 2 }),
      electionDate: ISODate('2024-01-23T02:07:23.000Z'),
      configVersion: 101786,
      configTerm: 82,
      self: true,
      lastHeartbeatMessage: ''
    }
  ],

what is going on here?

Kobe_W · January 23, 2024, 5:24am

I’m not sure how a replica set detects node issues, but the communication protocol may be different for health checks. i mean, even the user level query times out, the health check may still work. Someone from mongodb team needs to confirm this.