- Sharding >
- Sharding Concepts >
- Sharding Mechanics >
- Sharded Collection Balancing
Sharded Collection Balancing¶
On this page
Balancing is the process MongoDB uses to distribute data of a sharded collection evenly across a sharded cluster. When a shard has too many of a sharded collection’s chunks compared to other shards, MongoDB automatically balances the chunks across the shards. The balancing procedure for sharded clusters is entirely transparent to the user and application layer.
Cluster Balancer¶
The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. By default, the balancer process is always enabled.
Any mongos
instance in the cluster can start a balancing
round. When a balancer process is active, the responsible
mongos
acquires a “lock” by modifying a document in the
lock
collection in the Config Database.
Note
Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping
(i.e. clock skew) between mongos
instances could lead
to failed distributed locks. This carries the possibility of
data loss, particularly with skews larger than 5 minutes.
Always use the network time protocol (NTP) by running ntpd
on your servers to minimize clock skew.
To address uneven chunk distribution for a sharded collection, the balancer migrates chunks from shards with more chunks to shards with a fewer number of chunks. The balancer migrates the chunks, one at a time, until there is an even distribution of chunks for the collection across the shards. For details about chunk migration, see Chunk Migration Procedure.
Changed in version 2.6: Chunk migrations can have an impact on disk space. Starting in MongoDB 2.6, the source shard automatically archives the migrated documents by default. For details, see moveChunk directory.
Chunk migrations carry some overhead in terms of bandwidth and workload, both of which can impact database performance. The balancer attempts to minimize the impact by:
- Moving only one chunk at a time. See also Chunk Migration Queuing.
- Starting a balancing round only when the difference in the number of chunks between the shard with the greatest number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection reaches the migration threshold.
You may disable the balancer temporarily for maintenance. See Disable the Balancer for details.
You can also limit the window during which the balancer runs to prevent it from impacting production traffic. See Schedule the Balancing Window for details.
Note
The specification of the balancing window is relative to the local
time zone of all individual mongos
instances in the
cluster.
See also
Migration Thresholds¶
To minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of chunks for a sharded collection has reached certain thresholds. The thresholds apply to the difference in number of chunks between the shard with the most chunks for the collection and the shard with the fewest chunks for that collection. The balancer has the following thresholds:
Changed in version 2.2: The following thresholds appear first in 2.2. Prior to this release, a balancing round would only start if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks.
Number of Chunks | Migration Threshold |
Fewer than 20 | 2 |
20-79 | 4 |
80 and greater | 8 |
Once a balancing round starts, the balancer will not stop until, for the collection, the difference between the number of chunks on any two shards for that collection is less than two or a chunk migration fails.
Shard Size¶
By default, MongoDB will attempt to fill all available disk space with data on every shard as the data set grows. To ensure that the cluster always has the capacity to handle data growth, monitor disk usage as well as other performance metrics.
When adding a shard, you may set a “maximum size” for that shard.
This prevents the balancer from migrating chunks to the shard
when the value of mapped
exceeds the
“maximum size”. Use the maxSize
parameter of the
addShard
command to set the “maximum size” for the shard.