How does a mongo client route to Secondary during downtime?

Hugh_Ferguson · October 19, 2021, 7:54pm

There is one aspect of redundancy I’m having a bit of trouble with. Assuming a 3 server replica set, if the primary goes down, I get it that among the secondaries, one becomes the new primary. But any client probably has a single address for accessing the database. How does the client know to go to the new mongo primary? Do the replica servers reset DNS records or something (if that is even possible) ? Example: replica set contains mongodb-prod, mongodb-psec1, mongodb-psec2; mongodb-prod goes down catastrophically due to hard drive failure, mongodb-psec1 becomes new primary. How do client apps know to contact mongodb-psec1 instead of mongodb-prod? I guess I’m asking what are the mechanics of the failover, or is this something built into the mongo db driver that you’re using for your app (i.e. the app somehow queries what all the servers in the replica set are ahead of time)?
Thanks
-Hugh Ferguson

chris · October 19, 2021, 8:36pm

Hi @Hugh_Ferguson

The first part of connecting to a replicaset is understanding the connection string. What is returned from a mongodb+srv or a classic connection string is a seed list for the driver. The driver will connect to a host in that list and retrieve the topology. The driver will now know the hosts in the replicaset.

How a driver detects/is informed about the topology change I don’t know, perhaps we’ll find out.

Using the covid-19 dataset as an example:

mongodb+srv://readonly:readonly@covid-19.hip2i.mongodb.net/covid19

# get the seeds
dig +short SRV _mongodb._tcp.covid-19.hip2i.mongodb.net
0 0 27017 covid-19-shard-00-00.hip2i.mongodb.net.
0 0 27017 covid-19-shard-00-01.hip2i.mongodb.net.
0 0 27017 covid-19-shard-00-02.hip2i.mongodb.net.

# connect to one in direct mode
# mongodb+srv are tls enabled by default
mongosh --quiet --tls covid-19-shard-00-02.hip2i.mongodb.net
Atlas covid-19-shard-0 [direct: secondary] test>  db.hello()
{
  topologyVersion: {
    processId: ObjectId("616edf5c90a24a36b1cd41c8"),
    counter: Long("3")
  },
  hosts: [
    'covid-19-shard-00-00.hip2i.mongodb.net:27017',
    'covid-19-shard-00-01.hip2i.mongodb.net:27017',
    'covid-19-shard-00-02.hip2i.mongodb.net:27017'
  ],
  setName: 'covid-19-shard-0',
  setVersion: 42,
  isWritablePrimary: false,
  secondary: true,
  primary: 'covid-19-shard-00-01.hip2i.mongodb.net:27017',
  tags: {
    region: 'EU_WEST_1',
    nodeType: 'ELECTABLE',
    provider: 'AWS',
    workloadType: 'OPERATIONAL'
  },
  me: 'covid-19-shard-00-02.hip2i.mongodb.net:27017',
  lastWrite: {
    opTime: { ts: Timestamp({ t: 1634675405, i: 500 }), t: Long("87") },
    lastWriteDate: ISODate("2021-10-19T20:30:05.000Z"),
    majorityOpTime: { ts: Timestamp({ t: 1634675405, i: 500 }), t: Long("87") },
    majorityWriteDate: ISODate("2021-10-19T20:30:05.000Z")
  },
  maxBsonObjectSize: 16777216,
  maxMessageSizeBytes: 48000000,
  maxWriteBatchSize: 100000,
  localTime: ISODate("2021-10-19T20:30:05.095Z"),
  logicalSessionTimeoutMinutes: 30,
  connectionId: 20309,
  minWireVersion: 0,
  maxWireVersion: 9,
  readOnly: false,
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1634675405, i: 1000 }),
    signature: {
      hash: Binary(Buffer.from("3f067304cc392fbfeca89933059e72182ebfed70", "hex"), 0),
      keyId: Long("6982690098202542081")
    }
  },
  operationTime: Timestamp({ t: 1634675405, i: 500 })
}

kevinadi · October 19, 2021, 11:23pm

Hi @Hugh_Ferguson

Regarding SRV, it’s what @chris said.

Regarding topology change, the driver would connect to individual node in the replica set (initially based on the SRV records) and monitor their status. This would allow them to react as expected when a new secondary joins the replica set, or when there’s change in any node’s availability.

Here’s a blog post explaining how it works with some examples of how different drivers handle them: Server Discovery and Monitoring In Next Generation MongoDB Drivers, and here’s the spec itself.

*Note that the blog post dates back from 2015, so what was “next generation” then is the norm now.

Best regards,
Kevin

Hugh_Ferguson · October 20, 2021, 10:11pm

That helps a lot. So in layman’s terms, ultimately it is the driver that figures out the new arrangement.
-Hugh

chris · October 20, 2021, 10:13pm

In laymans terms it’s magic.