Replica set doesn't survive a pod restart

I was able to stand up a working 3-node RS (Mongo v6) on Digital Ocean Kubernetes with persistent storage, and running my app against the DB without any problem.

When I restart the mongod pod, it seems the RS is having trouble initializing itself correctly. See errors below.

Primary pod error:

{“t”:{“$date”:“2022-08-25T14:45:06.604+00:00”},“s”:“I”, “c”:“NETWORK”, “id”:4333208, “ctx”:“ReplicaSetMonitor-TaskExecutor”,“msg”:“RSM host selection timeout”,“attr”:{“replicaSet”:“rs0”,“error”:“FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "primary" } for set rs0”}}
{“t”:{“$date”:“2022-08-25T14:45:06.604+00:00”},“s”:“I”, “c”:“CONTROL”, “id”:20714, “ctx”:“LogicalSessionCacheRefresh”,“msg”:“Failed to refresh session cache, will try again at the next refresh interval”,“attr”:{“error”:“FailedToSatisfyReadPreference: Could not find host matching read preference { mode: "primary" } for set rs0”}}

mongosh session:

admin> rs.status()
MongoServerError: Our replica set config is invalid or we are not a member of it

Config is still there

admin> rs.conf()
{
  _id: 'rs0',
  version: 6,
  term: 1,
  members: [
    {
      _id: 0,
      host: 'mongod-rs-0.mongodb-service-rs:27017',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: {},
      secondaryDelaySecs: Long("0"),
      votes: 1
    },
    {
      _id: 1,
      host: 'mongod-rs-1.mongodb-service-rs:27017',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: {},
      secondaryDelaySecs: Long("0"),
      votes: 1
    },
    {
      _id: 2,
      host: 'mongod-rs-2.mongodb-service-rs:27017',
      arbiterOnly: false,
      buildIndexes: true,
      hidden: false,
      priority: 1,
      tags: {},
      secondaryDelaySecs: Long("0"),
      votes: 1
    }
  ],
  protocolVersion: Long("1"),
  writeConcernMajorityJournalDefault: true,
  settings: {
    chainingAllowed: true,
    heartbeatIntervalMillis: 2000,
    heartbeatTimeoutSecs: 10,
    electionTimeoutMillis: 10000,
    catchUpTimeoutMillis: -1,
    catchUpTakeoverDelayMillis: 30000,
    getLastErrorModes: {},
    getLastErrorDefaults: { w: 1, wtimeout: 0 },
    replicaSetId: ObjectId("6307630d55b01fb4471c7b7e")
  }
}

Notes:

  • I manually configure the RS (not using operator due to some unresolvable issue)
  • RS init steps:

on Primary:

rs.initiate()

var cfg = rs.conf()

cfg.members[0].host="mongod-rs-0.mongodb-service-rs:27017"

rs.reconfig(cfg)

rs.add("mongod-rs-1.mongodb-service-rs:27017")

rs.add("mongod-rs-2.mongodb-service-rs:27017")

To simulate an outage I shutdown all 3 mongo instances gracefully and restarted them:

kubectl scale sts mongod-rs --replicas 0

kubectl scale sts mongod-rs --replicas 3

Answering my own question. Include the following attribute in the headless yaml:

   publishNotReadyAddresses: true

See also https://jira.mongodb.org/browse/SERVER-24778

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.