Connecting via DNS seed list in a Kubernetes cluster

N_J · March 28, 2023, 9:50am

The documentation and original spec imply to me that the result of the SRV lookup ends up being equivalent having the member hosts supplied up front.

For example, if you specify connection string mongo+srv://mongo.foo.cluster.local, and the result from an SRV lookup against _mongodb._tcp.mongo.foo.cluster.local:

mongo-0.mongo.foo.cluster.local
mongo-1.mongo.foo.cluster.local
mongo-2.mongo.foo.cluster.local

… then client libraries would end up treating this equivalent to if you had originally supplied mongodb://mongo-0.mongo.foo.cluster.local,mongo-1.mongo.foo.cluster.local,mongo-2.mongo.foo.cluster.local when establishing the initial connection.

However, I’ve noticed that client libraries also poll SRV lookups, and the consequences of this are somewhat unclear to me. When querying SRV records in a kubernetes cluster, you’re likely to find that absent pods are no longer returned… so for example, if you were to rollout restart a statefulset, you’ll find the returned record changes in relatively rapid succession.

My concern is, I don’t understand the implications of this polling behaviour and the way in which the record will change over the space of a minute or two. If client libraries see members absent from an SRV lookup, do they then treat them as if they’ve been removed from the replica set, and drop any connections they have to them?

My main worry is circumstances like:

mongo-2 is terminating
client shuts down any connections to mongo-2
DNS polling in client performs a DNS lookup, sees mongo-2 is absent, removes any notion of them
DNS polling sleeps for 60 seconds
mongo-2 comes back up
mongo-1 is terminating
client shuts down any connections to mongo-1

… is the client now in a state where it only has a connection to mongo-0? Or will it have re-established a connection via other channels once mongo-2 has come back up?

I think the part that is particularly confusing me is, on the notion that the result from an SRV lookup acts as an initial seed list, I don’t really understand why this subsequent polling occurs.

Brock · March 28, 2023, 6:56pm

Hello @N_J

Just for a recap of understanding the basics behind all of this for yourself, and others who may be somewhat new to Kubernetes, Docker, etc.

Kubernetes is an orchestration tool that helps manage the runtimes between multiple containers, it is akin to Docker Swarm (I love both Kubernetes and Docker Swarm, probably the most amazing inventions ever thought up to be honest IMHO.)

Dockers intent and design is to run individual runtimes of the same application and its dependencies in each container.

Docker Swarm is Docker’s homemade tool to perform functions like Kubernetes (for reference purposes of understanding Kubernetes.)

Now, N_J, to your question this will depend upon Kubernetes networking and routing of the services. Kubernetes can funnel traffic to the pods via:

IP Addresses
DNS Records
Ports

And it can establish Load Balancing via the load balancer, the components work as follows:

L4 will balance traffic between nodes via IP Address, ports,
L7 balancing will distribute traffic across your nodes via HTTP headers, HTML Form Data, uniform resource identifier, SSL and session ID etc.
Geographic location of least hops if you have globally distributed or redundantly built infrastructure. (Such as regional cloud say AWS US EAST 1, and then an on premise server located in California, etc.

With this information now disclosed for conceptual purposes, the answer to your question comes into your configuration for how the runtimes are being orchestrated, connected to, and how the balancers within Kubernetes are being designed and told to work.

If you’ve configured and built your nodes to be connected to primary node, whichever it is, then that is what it’ll connect to. If you have a 3 node cluster as you describe then the behavior that should be expected on configuration is that you would have Kubernetes route the sessions to whatever the primary node is, or reroute connections to the new primary after the old primary has failed.

So when you do a SRV Lookup, you should always see 3 nodes/microservices for three containers running the MongoDB Microservice, and each node in the sharded cluster should know what place it holds with your Kubernetes knowing what to do and where to route the connection and session.

Does this make sense?

N_J · March 28, 2023, 9:45pm

Hi,

Thanks for the reply.

After further investigation I’ve found specifications/source/polling-srv-records-for-mongos-discovery/polling-srv-records-for-mongos-discovery.rst at master · mongodb/specifications · GitHub which explains the reasoning behind SRV polling.

Although it is still unclear to me whether the fairly dynamic nature of SRV lookups in a kubernetes cluster will potentially cause issues here. If you say have a 3 replica statefulset, a one of the pods is restarting, should an SRV lookup take place whilst that is occurring, you may find the host relating to that restarting pod absent in the result. And the implications for clients are, according to the spec:

MUST remove all hosts that are part of the topology, but are no longer in the returned set of valid hosts

Which seems fine, but what happens if that restarting pod comes back up, and then another pod restarts (which is typical during a statefulset restart) before the next SRV lookup takes place?

Brock · March 29, 2023, 12:06am

Which seems fine, but what happens if that restarting pod comes back up, and then another pod restarts (which is typical during a statefulset restart) before the next SRV lookup takes place?

In large, this isn’t something to really care all that much about during a node being restarted, and here’s why:

You’re going to have Kubernetes reroute all connections to the new designated primary node prior to restart of the current, messed up node.
You will setup the node to return back to DNS Record in a downgraded role from Primary an await its turn again to be Primary.
The above can also be additionally achieve through redundant routing via the balancer and IP in addition to the DNS Record. You can also do this via HTTP Headers, or IDs.

This is why that in particular doesn’t really matter in the grand scheme of your environment, because Kubernetes when you configure it should automatically make the connections reroute, restart the node, and then if necessary even rebuild the node and reset data backups etc. and so on and be back on DNS record when possible. All of this should occur with no connections to your overall services being lost/end users even noticing something happened.

In laments terms, you shouldn’t have need to care as much about any of that, when the clients or services/connections will already be connected/rerouted to the working node before anything else is restarted or SRV Lookup even happens again.

You can also configure Kubernetes to do an SRV lookup whenever you choose it to, such as refreshing every second, five seconds, 10 seconds, and so on. But what I"d encourage you do, is configure Kubernetes routing and balancers to automatically transfer users to the new primary node as well as predesignate a new primary node after 10 seconds time out from response.

This makes what’s relatively a completely seemless/not even noticed event to the end users, as they’ll just typically blame whatever it is on the browser, or their computer being slow. If that makes sense?