Sharding and Replica Set


Say we have 3 data centers and MongoDB shards are spread across these Datacenters(DC). For e.g Shard-A is in DC-1, Shard-B is in DC-2 and Shard-C is in DC-3.

Now each DC’s contains the replica set of other DC’s i.e. DC-1 contains a replica set of Shard-B & Shard-C, DC-2 contains a replica set of Shard-A & Shard-C and DC-3 contains a replica set of Shard-A and Shard-B. These replica sets allow the application to read.

Say there is a query that does not contain the Shard key which in turn requires the router(Mongos) to scatter and gather the data from all Shards. In MongoDB Doc, it is mentioned that setting “nearest” as read preference in replica set will read the data from nearest secondary/primary.

So I am assuming if set the readPerference as nearest then the router to do scatter and gather data within the DC itself. Please let me know if my understanding is wrong.

Thanks in advance.

Hi @Allwyn_Jesu and welcome in the MongoDB Community :muscle: !

Yes, your logic is correct. In theory, your DC local network is faster than the connection between 2 DCs so as long as that stays true, you should only scatter and gather from the DC that is the closest to the back-end node.

Couple of things I’d like to mention though.

Firstly, a small vocabulary update to avoid confusion: a Replica Set (RS) is a shard. In your case here, you are running 3 RS + 1 config RS (which needs to be distributed as well). A member of a RS (==a node) is either a Primary or a Secondary. So your distribution is that each DC contains one member of each RS.

Secondly, using readPreferrence = nearest will most probably get you to read from one Primary and 2 Secondaries which might return stale data (but it might also be completely OK for you). The good thing is that you can actually tune the “quality” of the answer that you will get by using the readConcern setting. If you want to stay on the safe side, I would recommend to use readConcern == “majority” but if you know what you are doing and you prefer speed over consistency, then you can use “available” for example.

I hope this helps.

Thanks for the response. Could you please point me to a document that explains how readPreferrence = nearest works internally? I am mainly interested in who(I am assuming router) is responsible for picking the nearest node for reading and how it is done? Say, finding the nearest node is implemented using ping or some other approach then how frequently it is carried out? is there a property to control the frequency etc?

1 Like

The specs are the best place at this point.

Search for “nearest” in this file. You will find a lot of things :nerd_face:.

I think all your questions are covered in there.


This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.