Hello Aasawari, thank you for your reply.
I’ve made some progress since that post and I think I have found the cause of the issues. Let me describe it first and then I will answer your questions.
So here’s my situation:
The client was connected to a PSSSA replicaset, let’s call it a replicaset-a
One of the members of the replicaset (let’s call it replicaset-a-member-1
) works on some physical machine and there are members of other replicasets on this machine as well. Among them there is a member from the repliaset-b
, let’s call it replicaset-b-member-1
The physical server has been rebooted.
In the logs on the client side, we see information about the member being disconnected:
The server is in quiesce mode and will shut down' on server replicaset-a-member-1
After a short time, the pods began to start, the first one to get up was replicaset-b-member-1
, and it was assigned the IP address that belonged to replicaset-a-member-1
a moment ago.
In the client’s logs we see 2 entries.
The first one says that the driver has connected again to the replicaset-a-member-1
, but in the response we can see that they data came from replicaset-b
:
Log-entry-1:
Monitor thread successfully connected to server with description ServerDescription{address=repliakset-a-member-1, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, minWireVersion=0, maxWireVersion=17, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=622150, setName='replicaset-b', canonicalAddress=repliakset-b-address, hosts=[replicaset-b-member-1, replicaset-b-member-2, replicaset-b-member-3, replicaset-b-member-4], passives=[], arbiters=[arbiter], primary='replicaset-b-member-2',
You can see above that on the one hand the log says that the client has connected to a member of the replicaset-a
, but the topology itself comes from replicaset-b
.
A second entry immediately follows:
Log-entry-2:
Expecting replica set member from set 'replicaset-a', but found one from set 'replicaset-b'. Removing replicaset-a-member-1 from client view of cluster
So it looks like the driver figures out that it has connected to the wrong replicaset and removes the member from the list of available ones.
These two messages repeat for some time (until the real replicaset-a-member-1
starts), and then the following entry appears:
Adding discovered server replicaset-a-member-1 to client view of cluster
So, the replicaset-a
node starts up, gets a different IP address, the driver connects to it, and everything looks correct.
Unfortunately, from that point on, every query to that node ends up with an HMAC error, which says that the client tried to get to replicaset-a
using the keyId from replicaset-b
:
Command failed with error 211 (KeyNotFound): 'No keys found for HMAC that is valid for time: { ts: Timestamp(XXX, 6130) } with id: replicaset-b-keyId' on server replicaset-a-member-1. The full response is {"ok": 0.0, "errmsg": "No keys found for HMAC that is valid for time: { ts: Timestamp(XXX, 6130) } with id: replicaset-b-keyId", "code": 211, "codeName": "KeyNotFound", "$clusterTime": {"clusterTime": {"$timestamp": {"t": YYY, "i": 2}}, "signature": {"hash": {"$binary": {"base64": "XYZ", "subType": "00"}}, "keyId": replicaset-a-keyId}}, "operationTime": {"$timestamp": {"t": ZZZ, "i": 2}}}
Apparently, during the connection attempt loop on the old IP address, the driver somehow manages to retrieve the keyId from replicaset-b
and saves it in the session.
My suspicion is that healtcheck (which results in getting keyId) is executed by a separate thread that is not synchronized in any way with the thread that is responsible for log-entry-1 and log-entry-2, and that it manages to shoot itself between these operations. That is:
- The driver connects to
replicaset-b-member-1
, log-entry-1 appears
- Another thread (running periodically) also connects to
replicaset-b-member-1
and retrieves the wrong keyId
- The driver detects that it is a connection to another replicaset, log-entry-2 appears, but does not clear the previously loaded key-id
When the correct node comes back, the driver still has the keyId from another replicaset in the session and tries to use it, which ends in an error. Interestingly, sometimes this lasts for a short time (1 error), sometimes for minutes or hours, and in one case we’ve been observing this for many days, it stopped after a client application restart. I can’t explain this yet.
The above situation I can reproduce:
- set the parameter
heartbeatFrequencyMS=10
- this increases the probability that the thread retrieving the keyId will hit the right spot between connecting to the wrong member and removing it from the cluster view,
- in the
/etc/hosts
file, enter hostname of the replicaset-a-member-1
name and redirect it to the IP address of replicaset-b-member-1
- set a breakpoint here: https://github.com/mongodb/mongo-java-driver/blob/master/driver-core/src/main/com/mongodb/internal/connection/ClusterClock.java#L54 I use a conditional breakpoint so that the code falls into it only when the keyId is different than the one from the
replicaset-a
- start the application and wait
- Sometimes it happens just after 1-3 minutes and sometimes I have to repeat the whole operation once or twice, but as a rule it is pretty easy to reproduce.
Out of 5 attempts, it always succeeds once, but I would appreciate it if someone else tries it.
And going back to your questions.
- we use DNS service but cache can also work on the client side and we have no control over that
- actually, now it seems to me that the exchanging IPs does not matter here, it is enough that on the IP on previously used by my member start a member from another replicaset
- For the case described above: MongoDB 6.0.8 + mongodb-driver-sync 4.8.2 but we have also seen this on other versions of the driver