Sharding an existing collection: chunk not splitting after balancing is done

Hey folks,

I store entities for countries all over the world, with around 15 popular countries amongst them. I have recently attempted to shard this collection as it grew larger. I index my entities with a unique hashed value which produces the shard key in combination with the corresponding country code.

I have performed sharding on this existing collection (not in a global cluster) with 3 shards, one being the config server. I used the sh.shardCollection command with a compound index { countryCode, entityId } . I only run the necessary commands in the process (create index, enable sharding).

After a few days, the balancing completed. My shards look the following:
shard0 (primary shard): 1 chunk from the sharded collection + rest of the unsharded collections, 500 GB
shard1: 2000 chunks from the sharded collection, 250 GB
shard2 (config): 2000 chunks from the sharded collection, 250 GB

shard0 is the primary. Based on data volume, it seems to hold around the same (250GB) of data for the sharded collection - but all in one chunk! The chunk is not a jumbo chunk, and there is no apparent pattern in the shard keys. shard1 and shard2 seem to contain entities from country A to SX. However, the entities from the rest SY - Z countries all go into shard0’s one chunk.

The balancer is enabled, and all settings are on default. There are daily data updates coming in which read and write the database (creating or updating entities for various countries including for countries between SY-Z).

I have also tried to split the chunk manually according to this page: https://www.mongodb.com/docs/manual/tutorial/split-chunks-in-sharded-cluster/
What I experienced is that the chunk is split after I run the command - however, after around 60 minutes shard0 goes back to having 1 chunk only.

Has anyone experienced a similar issue?

I found out that this might be a feature of mongodb v6+:

Chunks are not subject to auto-splitting. Instead, chunks are split only when moved across shards.

Does anyone know if there is any downsides to a chunk growing too large? And does anyone have an explanation for why chunk splits by sh.splitFind are not persisted?

Here is a screenshot from the output of getShardDistribution . This shows that the data is distributed equally by disk usage, but the number of chunks are highly unbalanced.

Hi Vince

You are correct — starting with MongoDB 6.0, we introduced a new sharding balancer policy based on data size rather than number of chunks. This means that chunks will only be split when they need to be moved. As a result, it’s completely expected to see only one chunk on a specific shard in certain cases.

Additionally, in MongoDB 7.0, we introduced the automerge feature, which is enabled by default. With automerge, the server can automatically merge adjacent chunks on the same shard, provided they meet the mergeability requirements. This explains why your previously split chunks may seem to “disappear” — they’re being merged automatically.

To summarize:

  • With the balancer enabled, it utilizes the data size policy to split and move chunks as needed to ensure even data distribution.
  • Once split chunks are moved and become eligible for merging, the server can automatically combine adjacent chunks on the same shard.

If you have any further questions or need additional clarification on this topic, feel free to ask—I’d be happy to help!