Hey folks,
I store entities for countries all over the world, with around 15 popular countries amongst them. I have recently attempted to shard this collection as it grew larger. I index my entities with a unique hashed value which produces the shard key in combination with the corresponding country code.
I have performed sharding on this existing collection (not in a global cluster) with 3 shards, one being the config server. I used the sh.shardCollection
command with a compound index { countryCode, entityId }
. I only run the necessary commands in the process (create index, enable sharding).
After a few days, the balancing completed. My shards look the following:
shard0 (primary shard): 1 chunk from the sharded collection + rest of the unsharded collections, 500 GB
shard1: 2000 chunks from the sharded collection, 250 GB
shard2 (config): 2000 chunks from the sharded collection, 250 GB
shard0 is the primary. Based on data volume, it seems to hold around the same (250GB) of data for the sharded collection - but all in one chunk! The chunk is not a jumbo chunk, and there is no apparent pattern in the shard keys. shard1 and shard2 seem to contain entities from country A to SX. However, the entities from the rest SY - Z countries all go into shard0’s one chunk.
The balancer is enabled, and all settings are on default. There are daily data updates coming in which read and write the database (creating or updating entities for various countries including for countries between SY-Z).
I have also tried to split the chunk manually according to this page: https://www.mongodb.com/docs/manual/tutorial/split-chunks-in-sharded-cluster/
What I experienced is that the chunk is split after I run the command - however, after around 60 minutes shard0 goes back to having 1 chunk only.
Has anyone experienced a similar issue?