Unexpected primary election when shutdown primary node in k8s update

NeverBehave · August 18, 2023, 1:09am

Related Issue: [bitnami/mongodb] Replicaset Graceful shutdown · Issue #17432 · bitnami/charts · GitHub
Related Discussion: Operator: Detected unclean shutdown - mongod seems never to shutdown gracefully - #2 by Yilmaz_Durmaz

TL;DR: When restarting the primary node in k8s cluster, the replicaset will lose primary for a short period of time because of election. Same thing won’t happen if using db.shutdownServer() command. Based on the documentation, using SIGTERM should have the same effect (stepDown to a secondary and skip election) as command shutdown, while it is not the case in real world. Instance has plenty of time to wrap up in our test environment (gracefulShutdownPeriod extended).

We would like to know if this design is intended? What’s the reason behind the behavior difference?

Thanks!

Yilmaz_Durmaz · August 18, 2023, 8:28am

I tried to check your github issue; I could read your logs but could not continue as it is too long for my current time slot.

In short, my wild guess here would be about how k8s shuts down servers.

As you already noted, SIGTERM causes “PRIMARY not found” error while db.shutdowServer does it gracefully.

You may immeditely notice this line at the start of secondary log for shutdown command:

... "ctx":"conn220026","msg":"Received replSetStepUp request"}

This clearly indicates continuous communication between nodes.

On the other hand, when SIGTERM used, primary says it is about to send a message to the cluster, but secondary starts failing to get heartbeat at the same interval.

... "ctx":"SignalHandler","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":10000}}
... "ctx":"ReplCoord-21","msg":"Heartbeat failed after max retries","attr":{"target":"mongodb-0.mongodb-headless.default.svc.cluster.local:27017","maxHeartbeatRetries":2,"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Error connecting to mongodb-0.mongodb-headless.default.svc.cluster.local:27017 (100.96.5.193:27017) :: caused by :: Connection refused"}}}

Your logs have a 5 seconds gap between these two lines, I wonder at what time secondary lost the heartbeat.

Anyways, this leads me thinking that the “network connection” for the primary is closed before it can send that step down command to the cluster, hence no heartbeat to others. And closing the network is the job of the k8s.

As you know, SIGTERM is part of forced shutdown commands, though it awaits the program to do cleanup. yet this does not tell anything about the rest of the system, especially for the network.

Unfortunately I don’t have the setup to test this, so I hope you and others find better explanation.

PS: you mention of v4.4 server. have you also tried with 5 or 6?

NeverBehave · August 18, 2023, 4:15pm

Thanks for the update! I what I could do right now is to check if preStop hook have different behavior than normal shutdown.

At the same time I don’t have 5/6 version available since I bumped into this problem during my upgrade process. But since bitnami team could reproduce this problem, I believe they are using 5/6 since those are the supported version.

Yilmaz_Durmaz · August 20, 2023, 9:13am

In docker containers, once the process that is set to run at the start shutdowns itself, the container will also be terminated.

preStop hook helps with this: it tells mongod to shutdown itself, it does so gracefully (unless timed out), and once mongod is stopped the container also shuts down itself.

Though I hoped to be wrong, as you too mentioned in your github follow up post, it seems container’s network is cut off early. This might be a bug in k8s, or it might be set somewhere else in it so pod resources stay up until main container process exits/dies.

Please keep us updated.