Optimizing Sharded Collections in MongoDB with Defragmentation
Matt Panton, Garaudy Etienne, Ratika Gandhi8 min read • Published Feb 07, 2023 • Updated Feb 08, 2023
Rate this article
So, what do you do if you have a large number of chunks in your sharded cluster and want to reduce the impact of chunk migrations on CRUD latency? You can use collection defragmentation!
In this post, we’ll cover when you should consider defragmenting a collection, the benefits of defragmentation for your sharded cluster, and cover all of the commands needed to execute, monitor, and stop defragmentation. If you are new to sharding or want a refresher on how MongoDB delivers horizontal scalability, please check out the MongoDB manual.
A sharded collection is stored as “chunks,” and a balancer moves data around to maintain an equal distribution of data between shards. In MongoDB 6.0, when the difference in the amount of data between two shards is two times the configured chunk size, the MongoDB balancer automatically migrates chunks between shards. For collections with a chunk size of 128MB, we will migrate data between shards if the difference in data size exceeds 256MB.
Every time it migrates a chunk, MongoDB needs to update the new location of this chunk in its routing table. The routing table stores the location of all the chunks contained in your collection. The more chunks in your collection, the more "locations" in the routing table, and the larger the routing table will be. The larger the routing table, the longer it takes to update it after each migration. When updating the routing table, MongoDB blocks writes to your collection. As a result, it’s important to keep the number of chunks for your collection to a minimum.
By merging as many chunks as possible via defragmentation, you reduce the size of the routing table by reducing the number of chunks in your collection. The smaller the routing table, the shorter the duration of write blocking on your collection for chunk migrations, merges, and splits.
A collection with an excessive number of chunks is considered fragmented.
In this example, a customer’s collection has ~615K chunks on each shard.
Defragmentation is the concept of merging contiguous chunks in order to reduce the number of chunks in your collection.
In our same example, after defragmentation on December 5th, the number of chunks has gone down to 650 chunks on each shard. The customer has managed to reduce the number of chunks in their cluster by a factor of 1000.
Defragmentation of a collection should be considered in the following cases:
- A sharded collection contains more than 20,000 chunks.
- Once chunk migrations are complete after adding and removing shards.
The process is composed of three distinct phases that all help reduce the number of chunks in your chosen collection. The first phase automatically merges mergeable chunks on the same shard. The second phase migrates smaller chunks to other shards so they can be merged. The third phase scans the cluster one final time and merges any remaining mergeable chunks that reside on the same shard.
Note: Do not modify the chunkSize value while defragmentation is executing as this may lead to improper behavior.
In phase one of the defragmentation process, MongoDB scans every shard in the cluster and merges any mergeable chunks that reside on the same shard. The data size of the resulting chunks is stored for the next phase of the defragmentation process.
After phase one is completed, there might be some small chunks leftover. Chunks that are less than 25% of the max chunk size set are identified as small chunks. For example, with MongoDB’s default chunk size of 128MB, all chunks of 32MB or less would be considered small. The balancer then attempts to find other chunks across every shard to determine if they can be merged. If two chunks can be merged, the smaller of the two is moved to be merged with the second chunk. This also means that the larger your configured chunk size, the more “small” chunks you can move around, and the more you can defragment.
In this phase, the balancer scans the entire cluster to find any other mergeable chunks that reside on the same shard and merges them. The defragmentation process is now complete.
If you have a highly fragmented collection, you can defragment it by issuing a command to initiate defragmentation via configureCollectionBalancing options.
1 db.adminCommand( 2 { 3 configureCollectionBalancing: "<database>.<collection>", 4 defragmentCollection: true 5 } 6 )
Throughout the process, you can monitor the status of defragmentation by executing balancerCollectionStatus. Please refer to our balancerCollectionStatus manual for a detailed example on the output of the balancerCollectionStatus command during defragmentation.
Defragmenting a collection can be safely stopped at any time during any phase by issuing a command to stop defragmentation via configureCollectionBalancing options.
1 db.adminCommand( 2 { 3 configureCollectionBalancing: "<database>.<collection>", 4 defragmentCollection: false 5 } 6 )
Let’s defragment a collection called
"airplanes"
in the "vehicles"
database, with the current default chunk size of 128MB.1 db.adminCommand( 2 { 3 configureCollectionBalancing: "vehicles.airplanes", 4 defragmentCollection: true 5 })
This will start the defragmentation process. You can monitor the process by using the balancerCollectionStatus command. Here’s an example of the output in each phase of the process.
1 { 2 "balancerCompliant": false, 3 "firstComplianceViolation": "defragmentingChunks", 4 "details": { 5 "currentPhase": "mergeAndMeasureChunks", 6 "progress": { "remainingChunksToProcess": 1 } 7 } 8 }
Since this phase of the defragmentation process contains multiple operations such as
mergeChunks
and dataSize
, the value of the remainingChunksToProcess
field will not change when the mergeChunk
operation has been completed on a chunk but the dataSize operation is not complete for the same chunk.1 { 2 "balancerCompliant": false, 3 "firstComplianceViolation": "defragmentingChunks", 4 "details": { 5 "currentPhase": "moveAndMergeChunks", 6 "progress": { "remainingChunksToProcess": 1 } 7 } 8 }
Since this phase of the defragmentation process contains multiple operations, the value of the
remainingChunksToProcess
field will not change when a migration is complete but the mergeChunk
operation is not complete for the same chunk.1 { 2 "balancerCompliant": false, 3 "firstComplianceViolation": "defragmentingChunks", 4 "details": { 5 "currentPhase": "mergeChunks", 6 "progress": { "remainingChunksToProcess": 1 } 7 } 8 }
When the process is complete, for a balanced collection the document returns the following information.
1 { 2 "balancerCompliant" : true, 3 "ok" : 1, 4 "operationTime" : Timestamp(1583193238, 1), 5 "$clusterTime" : { 6 "clusterTime" : Timestamp(1583193238, 1), 7 "signature" : { 8 "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), 9 "keyId" : NumberLong(0) 10 } 11 } 12 }
Note: There is a possibility that your collection is not balanced at the end of defragmentation. The balancer will then kick in and start migrating data as it does regularly.
- How long does defragmentation take?
- The duration for defragmentation will vary depending on the size and the “fragmentation state” of a collection, with larger and more fragmented collections taking longer.
- The first phase of defragmentation merges chunks on the same shard delivering immediate benefits to your cluster. Here are some worst-case estimates for the time to complete phase one of defragmentation:
- Collection with 100,000 chunks - < 18 hrs
- Collection with 1,000,000 chunks - < 6 days
- The complete defragmentation process involves the movement of chunks between shards where speeds can vary based on the resources available and the cluster’s configured chunk size. It is difficult to estimate how long it will take for your cluster to complete defragmentation.
- Can I use defragmentation to just change my chunk size?
- Yes, just run the command with
"defragmentCollection: false"
.
- How do I stop an ongoing defragmentation?
- Run the following command:
1 db.adminCommand( 2 { 3 configureCollectionBalancing: "<database>.<collection>", 4 defragmentCollection: false 5 } 6 )
- Can I change my chunk size during defragmentation?
- Yes, but this will result in a less than optimal defragmentation since the new chunk size will only be applied to any future phases of the operation.
- Alternatively, you can stop an ongoing defragmentation by running the command again with
"defragmentCollection: false"
. Then just run the command with the new chunk size and"defragmentCollection: true"
.
- What happens if I run defragmentation with a different chunk size on a collection where defragmentation is already in progress?
- Do not run defragmentation with a different chunk size on a collection that is being defragmented as this causes the defragmentation process to utilize the new value in the next phase of the defragmentation process, resulting in a less than optimal defragmentation.
- Can I run defragmentation on multiple collections simultaneously?
- Yes. However, a shard can only participate in one migration at a time — meaning during the second phase of defragmentation, a shard can only donate or receive one chunk at a time.
- Can I defragment collections to different chunk sizes?
- Yes, chunk size is specific to a collection. So different collections can be configured to have different chunk sizes, if desired.
- Why do I see a 1TB chunk on my shards even though I set chunkSize to 256MB?
- In MongoDB 6.0, the cluster will no longer partition data unless it’s necessary to facilitate a migration. So, chunks may exceed the configured
chunkSize
. This behavior reduces the number of chunks on a shard which in turn reduces the impact of migrations on a cluster.
- Is the value “true” for the key defragmentCollection of configureCollectionBalancing persistent once set?
- The
defragmentCollection
key will only have a value of"true"
while the defragmentation process is occurring. Once the defragmentation process ends, the value for defragmentCollection field will be unset from true.
- How do I know if defragmentation is running currently, stopped, or started successfully?
- Use the balancerCollectionStatus command to determine the current state of defragmentation on a given collection.
- In the document returned by the
balancerCollectionStatus
command, the firstComplianceViolation field will display“defragmentingChunks”
when a collection is actively being defragmented. - When a collection is not being defragmented, the balancer status returns a different value for “firstComplianceViolation”.
- If the collection is unbalanced, the command will return
“balancerCompliant: false”
and“firstComplianceViolation
:“chunksImbalance””
. - If the collection is balanced, the command will return
“balancerCompliant: true”
. See balancerCollectionStatus for more information on the other possible values.
- How does defragmentation impact my workload?
- The impact of defragmentation on a cluster is similar to a migration. Writes will be blocked to the collection being defragmented while the metadata refreshes occur in response to the underlying merge and move defragmentation operations. The duration of the write blockage can be estimated by reviewing the mongod logs of a previous donor shard.
- Secondary reads will be affected during defragmentation operations as the changes on the primary node need to be replicated to the secondaries.
- Additionally, normal balancing operations will not occur for a collection being defragmented.
- What if I have a balancing window?
- The defragmentation process respects balancing windows and will not execute any defragmentation operations outside of the configured balancing window.
- Is defragmentation resilient to crashes or stepdowns?
- Yes, the defragmentation process can withstand a crash or a primary step down. Defragmentation will automatically restart after the completion of the step up of the new primary.
- Is there a way to just do Phase One of defragmentation?
- You can’t currently, but we may be adding this capability in the near future.
- What if I’m still not happy with the number of chunks in my cluster?
- Consider setting your chunk size to 1GB (1024MB) for defragmentation in order to move more mergeable chunks.
1 db.adminCommand( 2 { 3 configureCollectionBalancing: "<database>.<collection>", 4 chunkSize: 1024, 5 defragmentCollection: true 6 } 7 )
- How do I find my cluster’s configured chunk size?
- You can check it in the
“config”
database.
1 use config 2 db.settings.find()
Note: If the command above returns Null, that means the cluster’s default chunk size has not be overridden and the default chunk size of 128MB is currently in use.
- How do I find a specific collection’s chunk size?
1 use <databasename> 2 db.adminCommand( 3 { 4 balancerCollectionStatus: "<database>.<collection>" 5 } 6 )
- How do I find a specific collection’s number of chunks?
1 use <databasename> 2 db.collection_name.getShardDistribution()