Routing in geographic sharding

John_Knoop · September 27, 2021, 4:32pm

Let’s say you’ve set up a sharded cluster with data in two zones like in this example:

And you have an application that also runs in two separate regions (one in us and one in Europe). How does the reads and writes from the application running in the European data center get routed to the European shard, and vice versa? Do the two application use different connection strings, so that the routing is handled at the DNS level? Or do both applications connect using the same connection string? If so, won’t that mean that one of the application instances will have to do a cross-region connection, even if subsequent connections are to the closest shard?

Thanks

Andrew_Davidson · September 27, 2021, 5:41pm

Hi John,

I am going to answer your question assuming you’re using MongoDB Atlas’s implementation of this concept, which Atlas calls “Global Clusters” (read more here: https://docs.atlas.mongodb.com/global-clusters/)

What happens is that Atlas puts a query router node (called a mongos in backend nomenclature) in every zone: then the concise SRV connection string which is shared globally allows the application-tier drivers to poll those query routers (mongos): the driver automatically connects to the nearest query router (mongos). This means that if you have an app in Europe, that app will connect to the data shards behind the scenes via that query router co-located with the European zone.

To explain in more depth: the SRV connection string essentially enables the driver to look up a comma-delineated list of mongos hostnames, which are going to be located globally wherever there is a zone. The driver uses this list to find the nearest one to start using for queries.

Cheers
-Andrew

John_Knoop · September 27, 2021, 6:45pm

Hi Andrew and thanks for the quick response!

So it’s the SRV connection string that does the magic? That’s some kind of DNS feature right? Is that the topic I should read up on if I want to understand how each app instance is guaranteed to connect to the nearest shard?

Andrew_Davidson · September 27, 2021, 6:55pm

Hi John,

Honestly even if you used the legacy connection string (a large list of mongos hostnames comma delineated) it’s the same thing: all SRV does is make that list a single hostname so that it’s more concise – and SRV makes that long list accessible from DNS directly. But there’s no real magic in that in both cases the driver gets the full list and then finds the closest one.

Cheers
-Andrew

John_Knoop · September 27, 2021, 7:22pm

Oh, I see. Yeah, that makes sense.

Do you know how the driver figures out which shard is the closest? Is that something that is implemented in the driver itself somehow, or is that handled automatically by some internet protocol?

Thanks

John_Knoop · September 27, 2021, 7:30pm

Also, what’s the differenct between Global Clusters and “Multi-Region Deployments” mentioned on this page?

Thanks again!

Andrew_Davidson · September 27, 2021, 7:53pm

The driver just sees which of the mongos’s it gets a ping response from fastest (so no real magic here).

Regarding your second-question, Multi-Region refers to replication, where there is a preferred region that writes go to (unless that region is lost and there’s a failover to another region) as opposed to Atlas Global Clusters which have different write regions in different parts of the world!

I am happy you’re asking because these concepts are so nuanced and we need to figure out how to position them better.

Cheers
-Andrew

John_Knoop · September 27, 2021, 8:46pm

Ok, so if we want geographic sharding on Atlas, then Global Clusters is the way to go? Do you know if Global Clusters will eventually come as serverless?

Andrew_Davidson · September 28, 2021, 12:48pm

That’s a great question. Serverless is a long and major journey for us starting with the basics and building up, so not over the near-term but long-term yes. I also believe it’s important to maintain an open mind and continue to evaluate the best strategy and user experience for delivering a global in-region latency application experience. We like our model of embedding the location in the schema but I wonder if over time we’re going to need to invest in capabilities or see others democratize this further at the app tier since the database doesn’t live in isolation. Feedback appreciated

Enzo_Angel_Trevisan · October 19, 2023, 4:04pm

Hi Andrew,
I have a shard that has a primary node in one region A and a secondary in region B, and an aplication in region B. The documentation seems to indicate that all traffic goes from the application to the primary Node in region B instead to go to the secondary node in the same region.
That is correct ?