External Access to Kubernetes Replica Set Fails with MongoNetworkError: getaddrinfo ENOTFOUND

Unfortunately replicaSet horizons are pretty confusing in how they work for exposing mongodb nodes externally.

Essentially, the driver connecting to the replicaset needs only one seed node in order to learn about the topology (addresses of the other nodes) of the replicaset using db.hello()/isMaster(). When connecting to the replicaset nodes from inside k8s cluster, there are all nodes visible under the .svc.cluster.local addresses, so nothing to be done here.

But when the driver is connecting to one of the nodes that is exposed externally on a different hostname than the node is internally deployed on, then the db.hello() will return the hostnames of the other nodes as internal addresses.

So that’s why the replicaset horizons mechanism was introduced for. It’s configuring a consistent set of hostnames for a particular networking path the replicaset nodes are exposed on, in order to return a consistent view (a horizon!) of all nodes in the replicaset from the perspective of the client. I hope that makes sense.

And the underlying mechanism that allows mongod to read on which hostname the client is connecting to is the SNI from TLS, which makes TLS mandatory in mongod for horizons to work (but is also quite good to have it when exposing externally).

This diagram should show how the mongod is handling connections from different horizons. I hope it will make it more clear.

              +----------------+
              |     CLIENT     |
              | (Connects with |
              |   TLS + SNI)   |
              +----------------+
                      |
                      +-----> SNI: m3.infra-dashboards.mobicycle.pt
                      |
                      V
              +----------------+
              | MONGOD SERVER  |
              | (Receives TLS  |
              |  conn with SNI)|
              +----------------+
                      |
                      +-----> Looks up hostname from SNI and checks configured horizons
                      |       (the internal hostnames are always part of the __default horizon)
                      |       replicaSetHorizons:
                      |       - __default: mongodb-prod-0.mongodb-prod-svc.mongodb.svc.cluster.local:27017
                      |         external:  m1.infra-dashboards.mobicycle.pt:27017
                      |       - __default: mongodb-prod-1.mongodb-prod-svc.mongodb.svc.cluster.local:27017
                      |         external:  m2.infra-dashboards.mobicycle.pt:27017
                      |       - __default: mongodb-prod-2.mongodb-prod-svc.mongodb.svc.cluster.local:27017
                      |         external:  m3.infra-dashboards.mobicycle.pt:27017
                      V
        +-------------------------------------------+
        |         replicaSetHorizons                |
        |            Configuration                  |
        | (Finds that                               |
        |   m3.infra-dashboards.mobicycle.pt:27017  |
        |   is part of "external" horizon)          |
        +-------------------------------------------+
                      |
                      +-----> db.Hello() returns all hosts from "external" horizon
                      |       [
                      |         m1.infra-dashboards.mobicycle.pt:27017,
                      |         m2.infra-dashboards.mobicycle.pt:27017,
                      |         m3.infra-dashboards.mobicycle.pt:27017
                      |       ]
                      V
              +----------------+
              |     CLIENT     |
              | (Connects to   |
              |  all members)  |
              +----------------+

In your case, terminating TLS traffic in Traefik was a mistake as you erased the essential information needed for mongod to get the hostname. Try to configure Traefik in a TLS Passthrough proxy mode without termination. You can probably use SNI field directly in Traefik’s configuration to dispatch the connection to the appropriate pod.

Also I would suggest using a naming convention when exposing externally, to match have the pod name in the external domain. You can then create a semi-automatic dispatching rule in your reverse proxy component of choice and configure it to receive …example.com:27017 and dispatch it to a service’s FQDN using also that pod name.

I hope that helps!