Confused with "Chunks"

I don’t think I really understand the concept of “chunks”. Why isn’t there only one chunk per shard node? How can a single shard node have multiple chunks, and what is the advantage of that?

The most important factor I’ve seen so far is size. A chunk has a maximum size of 64MB. So if you have two shards, but a dataset of 500MB, then you’ll need multiple chunks per shard.

Thank you for the reply. I just have one more question. Are chunks used so that they can be easily moved to different shards in case one shard node has too many chunks? Like moving a chunk would be easier than moving documents one by one?

Are chunks used so that they can be easily moved to different shards in case one shard node has too many chunks?

The term you’re looking for is “chunk migration” which is done by the “balancer”.

Imagine you have 3 buckets full of legos. Now imagine two scenarios.

Scenario 1:
Bucket 1 only has red legos
Bucket 2 only has blue legos
Bucket 3 only has green legos

Scenario 2:
Bucket 1 has an even mix of all three colors
Bucket 2 has an even mix of all three colors
Bucket 3 has an even mix of all three colors

Now, let’s play a game. A horde of young children need to go fetch legos. They don’t know which color they need ahead of time, it’s called out randomly. Which scenario do you think will handle the children fetching colored legos at random?

We want chunks on different servers so that no single server is too overloaded with requests while others sit idle. Imagine 30 children being told to grab a blue lego. In on scenario, each bucket can serve 10 children. In another, you have fights and tears.

2 Likes

We want chunks on different servers so that no single server is too overloaded with requests while others sit idle.

Nice analogy, but that doesn’t go into @samym0 question “ Why isn’t there only one chunk per shard node? ”. In your analogy each bucket would contain a nice mix of red/green/blue. One bucket in each zone of the playroom. Or one chunk per shard node.

But in a sharded MongoDB cluster you’d eventually get multiple buckets per zone, which is what OP was wondering about. Why not just have one bigger bucket for each zone, instead of multiple buckets?

My answer does say why. If you just had a bigger bucket you’d still end up wasting resources by having your other buckets be more idle.

Remember, the idea for sharding is efficiency and scaling. Eventually you can’t put more memory on a server or you have so much data that you may have terabytes on disk. This could seriously impact recovery time in the event of a catastrophic hardware failure.

Okay, before I put my foot in my mouth any further In your “bucket” analogy, is each bucket one shard? Or is a bucket one chunk? Because OP did not understand why one shard node (replica set) would have more than one chunk. In a cluster with three shards OP was expecting 3 chunks (one per shard) as opposed to 3, 4, 6, 9 or more.

No worries, apologies if it wasn’t clear.

In the bucket example each bucket is a shard, and the legos themselves are chunks. Having good distribution ensures you don’t overload one the buckets under normal operations.

As I mentioned earlier, imagine if you only had one chunk per shard. You have to support infinite growth, potentially. At a certain point you’d run out of memory and disk space. How would you continue to grow?

The MongoDB answer is chunks of fixed, limited size. Then if you need more capacity, you can add more shards.

3 Likes