HMAC keys in system.keys collection & HMAC key monitoring thread

Hi All,

In PSSSA (1 primary, 3 secondary, 1 arbiter) deployments when 2 secondary DB’s go down and only PSA is present. when the primary mongo db is restart few times , all database queries start to fail and following error in observed in mongo db logs ,

"ctx":"monitoring-keys-for-HMAC","msg":"setting timestamp read source","attr":{"readSource":"kMajorityCommitted","provided":"none"

"ctx":"monitoring-keys-for-HMAC","msg":"setting timestamp read source","attr":{"readSource":"kNoTimestamp","provided":"none"

"ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.

All other application queries also fail with “Failed to refresh key cache” error. The cluster wide read concern is set as local. Though the above logs indicate it expects a majority read concern. The mongo db remains in the same state until one secondary comes up , when PSSA is present and when there is majority read possible the server self heals and this issue doesn’t occur. There are no logs to indicate the exact query that is being operated on system.keys collection to figure out what is the readconcern on the query being executed. Is this expected behaviour or if we are missing something ?

1 Like

Mongo version used is 8.0.6 in kubernetes environment

Mongo java driver version ( mongodb-driver-sync) used is 5.3.1

Did you ever solve this issue? I am having the same issue where internal system processes require majority write concern even though W:1 is set as defaultWriteConcern. Apparently this only applied to application writes and a node can even be elected as primary even though it truly can’t be written to properly while the internal processes can’t reach a majority.

This used to be able to be possible before Mongo 5 introduced a global write concern for internal system writes. Unfortunately there is no way to run a replicaset from what I have seen without this, which is a problem for those that self-host and don’t have three physical locations for failover.