Zone based sharding data balance issue

Question regarding Zone sharding with Range key, referred doc:

data estimation: 1500GB

Have a Collection with shard key like: country: 1, cityId: 1, userId: 1
country has only 4 option(low cardinality): US, UK, IND, UE
cityId and userId is UUID,

I am planning to use Zone Sharding, 1 zone (4 shard).
with zone range:
sh.addTagRange(“db.collection”, { “country” : “US”, “city” : MinKey, “userId”: MinKey }, { “country” : “US”, “city” : MaxKey, “userId”: MaxKey }, “USA”);

wanted to understand whether the shard key will do balanced distribution in zone cluster and how.(having low cardinality of first prefix value)

Hey Sudarshan, good question!

MongoDB will direct all documents in the “db.collection” where the “country” value is “US”, regardless of the “cityId” and “userId” values (since MinKey and MaxKey represent the lower and upper bounds of possible values, respectively), to the shards associated with the “USA” tag.

Documents with the same cityId are more likely to be on the same shard, but that’s not guaranteed. For example, it’s possible (although unlikely since nothing is hashed) that a document with country:US, cityId:1234, userId:0001 lives on shard0 and a document with country:US, cityId:1234, userId:0002 lives on shard1. But all the documents with “US” will live on one of the 4 shards tagged to the “USA” zone.

  • What is the desired distribution of data across your shards?
  • Are there specific shards you want to associate with the other countries in your dataset?
  • Do you also plan to set tag ranges for ‘UK’, ‘IND’, and ‘UE’?
  1. Desired distribution would be balanced distribution in a particular zone cluster.
    not sure that will be achieved with the range sharding in the Zone cluster. suspecting it might end up with a hot shard.
    is there any reference doc to achieve balanced distribution with the above scenarios?

should we look for Hash sharding with Zone cluster?

  1. Planning separate Zone clusters for 2 of them, the rest 2 will be in a single cluster.

  2. Yes.

Hashed sharding will help ensure inserts go to all shards equally. But if userId is already randomly generated then you write distribution will be random/good enough. Is userId random or is it monotonically increasing? If it’s monotonic then hashing just the userId piece should be enough to avoid hot shards.

So a shard key of country:1,cityId:1,userId:"hashed" should be good enough to ensure good write distribution if userId isn’t random. I would guess that there could be too many documents with the same cityId for some popular cities.

Sorry for the late response… was away

I have another question regarding the zone sharding.
Suppose, I have 3 shards : s1, s2, s3 and 3 zones: z1, z2, z3
all zone has all shards: z1 (s1,s2,s3), z2 (s1,s2,s3), z3 (s1,s2,s3)
key: country: 1, cityId: 1, userId: 1

now if I would say:

1. sh.addTagRange(“db.collection”, { “country” : “US”, “city” : MinKey, “userId”: MinKey }, { “country” : “US”, “city” : MaxKey, “userId”: MaxKey }, “Z1”);
2. sh.addTagRange(“db.collection”, { “country” : “IND”, “city” : MinKey, “userId”: MinKey }, { “country” : “IND”, “city” : MaxKey, “userId”: MaxKey }, “Z2”);
3. sh.addTagRange(“db.collection”, { “country” : “UE”, “city” : MinKey, “userId”: MinKey }, { “country” : “UE”, “city” : MaxKey, “userId”: MaxKey }, “Z3”);

Then which approach data distribution will use, 1 or 2

  1. (country: 1, cityId: 1, userId: 1) full key OR
  2. (cityId: 1, userId: 1) : because now the cardinality of the country is 1, and already used to find zone