I don’t have a lot of details of the problem since we had to revert our upgrade in production pretty quickly. But we have a sharded cluster with 3 shards (mongocluster1
,mongocluster2
,mongocluster3
). Each shard is a 3-node RS. The config servers are a 3-node RS as well (configReplSet
). The configReplSet
and 2 of the shards (mongocluster1
and mongocluster2
) seemed to upgrade fine. However, in our second shard (mongocluster2
which consists of vindb2
, pndb2
and daldb2
), daldb2
showed the following status in mongosh
prompt. Notice that the state says [direct: other]
however, rs.status()
shows SECONDARY
for the node. :
mongocluster2 [direct: other] test> rs.status()
{
set: 'mongocluster2',
date: ISODate("2022-11-20T07:30:30.003Z"),
myState: 2,
term: Long("66"),
syncSourceHost: 'pndb2:27017',
syncSourceId: 7,
heartbeatIntervalMillis: Long("2000"),
majorityVoteCount: 2,
writeMajorityCount: 2,
votingMembersCount: 2,
writableVotingMembersCount: 2,
optimes: {
lastCommittedOpTime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
lastCommittedWallTime: ISODate("2022-11-20T07:30:29.566Z"),
readConcernMajorityOpTime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
appliedOpTime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
durableOpTime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
lastAppliedWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastDurableWallTime: ISODate("2022-11-20T07:30:29.566Z")
},
lastStableRecoveryTimestamp: Timestamp({ t: 1668929239, i: 1 }),
members: [
{
_id: 6,
name: 'vindb2:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
uptime: 37,
optime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
optimeDurable: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
optimeDate: ISODate("2022-11-20T07:30:29.000Z"),
optimeDurableDate: ISODate("2022-11-20T07:30:29.000Z"),
lastAppliedWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastDurableWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastHeartbeat: ISODate("2022-11-20T07:30:29.881Z"),
lastHeartbeatRecv: ISODate("2022-11-20T07:30:28.785Z"),
pingMs: Long("29"),
lastHeartbeatMessage: '',
syncSourceHost: '',
syncSourceId: -1,
infoMessage: '',
electionTime: Timestamp({ t: 1668928629, i: 1 }),
electionDate: ISODate("2022-11-20T07:17:09.000Z"),
configVersion: 27,
configTerm: 66
},
{
_id: 7,
name: 'pndb2:27017',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 37,
optime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
optimeDurable: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
optimeDate: ISODate("2022-11-20T07:30:29.000Z"),
optimeDurableDate: ISODate("2022-11-20T07:30:29.000Z"),
lastAppliedWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastDurableWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastHeartbeat: ISODate("2022-11-20T07:30:29.733Z"),
lastHeartbeatRecv: ISODate("2022-11-20T07:30:29.465Z"),
pingMs: Long("22"),
lastHeartbeatMessage: '',
syncSourceHost: 'vindb2:27017',
syncSourceId: 6,
infoMessage: '',
configVersion: 27,
configTerm: 66
},
{
_id: 11,
name: 'daldb2:27017',
health: 1,
state: 2,
stateStr: 'SECONDARY',
uptime: 39,
optime: { ts: Timestamp({ t: 1668929429, i: 1 }), t: Long("66") },
optimeDate: ISODate("2022-11-20T07:30:29.000Z"),
lastAppliedWallTime: ISODate("2022-11-20T07:30:29.566Z"),
lastDurableWallTime: ISODate("2022-11-20T07:30:29.566Z"),
syncSourceHost: 'pndb2:27017',
syncSourceId: 7,
infoMessage: '',
configVersion: 27,
configTerm: 66,
self: true,
lastHeartbeatMessage: ''
}
],
ok: 1,
'$gleStats': {
lastOpTime: Timestamp({ t: 0, i: 0 }),
electionId: ObjectId("000000000000000000000000")
},
lastCommittedOpTime: Timestamp({ t: 1668929429, i: 1 }),
'$clusterTime': {
clusterTime: Timestamp({ t: 1668929429, i: 1 }),
signature: {
hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
keyId: Long("0")
}
},
operationTime: Timestamp({ t: 1668929429, i: 1 })
}
I don’t have much to work on. The only other clue I saw was in some app logs that had an error like:
"errmsg" : "Encountered non-retryable error during query :: caused by :: BSON field 'DatabaseVersion.timestamp' is missing but a required field"
Does anyone know why the prompt would show [direct: other]
while rs.status()
shows everything is fine?