Fragmentation is where a sharded collection's data is broken up into an unnecessarily large number of small chunks. This can increase operation times of CRUD operations run on that collection. Defragmentation reduces the number of chunks by merging smaller chunks into larger ones, resulting in lower CRUD operation times.
If CRUD operation times are acceptable, you don't need to defragment collections.
The following table summarizes defragmentation information for various MongoDB versions.
MongoDB 7.0 and later
Chunks are automatically merged. Performance improvements from defragmenting a collection in MongoDB 7.0 are lower compared to MongoDB 6.0. Typically, you don't need to defragment collections starting in MongoDB 7.0.
MongoDB 6.0 and earlier than 7.0
Defragment collections only if you experience CRUD operation delays when the balancer migrates chunks or a node starts.
Starting in MongoDB 6.0, high write traffic should not cause fragmentation. Chunk migrations cause fragmentation.
Earlier than MongoDB 6.0
Defragment collections only if you experience longer CRUD operation times during metadata updates. For MongoDB versions earlier than 6.0, a sharded collection becomes fragmented when the collection size grows significantly because of many insert or update operations.
To defragment a sharded collection, use the
defragmentCollection option. The option is available starting in
Consider these issues before you defragment collections:
Defragmentation might cause many metadata updates on the shards. If your CRUD operations are already taking longer than usual during migrations, you should only run defragmentation during a shard balancing window to reduce the system workload.
If defragmentation is impacting workload and CRUD latency on the cluster, you can reduce the impact using the
Merged chunks lose their placement history.
This means that while defragmentation is running, snapshot reads and indirectly, transactions, could fail with stale chunk history errors.
Placement history records the shards that a chunk was stored on. Defragmentation erases the placement history and some operations could fail, but will typically resolve after around five minutes.
Defragmentation affects the locality of the documents in a collection by moving data between shards. If a collection has ranges of data that are frequently accessed, after defragmenting the collection it is possible that the frequently accessed data will be on one shard. This might decrease the performance of CRUD operations by placing the workload on one shard instead of multiple shards.
Typically, you should use a shard balancing window to specify when the balancer runs instead of manually starting and stopping defragmentation.
This section describes additional details related to defragmenting sharded collections.
defragmentCollection field returned by the
configureCollectionBalancing command is only
defragmentation is running.
After defragmentation automatically ends or you manually stop
defragmentCollection field is removed from the
Secondary node reads are permitted during defragmentation, but might take longer to complete until metadata updates on the primary node are replicated to the secondary nodes.
For details about the MongoDB balancer, see Sharded Cluster Balancer.
For an introduction to
Modify Range Size in a Sharded Cluster.
The following table describes how
chunkSize affects defragmentation
and the balancer operations in different MongoDB versions.
MongoDB 6.0 and later
When the collection data shared between two shards differs by
three or more times the configured
For example, if
Earlier than MongoDB 6.0
When a chunk grows larger than
chunkSize, the chunk is split.
When chunks are moved, split, or merged, the shard metadata is updated after the chunk operation is committed by a config server. Shards not involved in the chunk operation are also updated with new metadata.
The time for the shard metadata update is proportional to the size of the routing table. CRUD operations on the collection are temporarily blocked while the shard metadata is updated, and a smaller routing table means shorter CRUD operation delays.
Defragmenting a collection reduces the number of chunks and the time to update the chunk metadata.
To reduce the system workload, configure the balancer to run only at a specific time using a shard balancing window. Defragmentation runs during the balancing window time period.
You can use the
to limit the rate of split and merge commands run by the balancer.
You can start and stop defragmentation at any time.
You can also set a shard zone. A shard zone is based on the shard key, and you can associate each zone with one or more shards in a cluster.
Starting in MongoDB 6.0, a sharded cluster only splits chunks when
chunks must be migrated. This means the chunk size may exceed
chunkSize. Larger chunks reduce the number of chunks on a shard and
improve performance because the time to update the shard metadata is
reduced. For example, you might see a 1 TB chunk on a shard even though
you have set
chunkSize to 256 MB.
chunkSize affects the following:
Maximum amount of data the balancer attempts to migrate between two shards in a single chunk migration operation.
Amount of data migrated during defragmentation.
Introduction to sharding, see Sharding
Partition data with chunks, see Data Partitioning with Chunks
Configure collection balancing, see
Examine balancer collection status, see
Configure shard balancing windows, see Schedule the Balancing Window
Monitor shards using MongoDB Atlas, see Review Sharded Clusters