We maintain a production MongoDB sharded cluster (version 4.0.0) with WiredTiger as the storage engine. We need to drop several inactive databases (totaling ~300 GB) from production, but after deletion (and even after running compact and repairDatabase), the underlying filesystem does not release the freed space back to the OS.
To understand the behavior and test various remedies, we replicated the entire cluster topology in a development environment running MongoDB 6.0. However, deleting data there also did not free filesystem space. We have outlined below our cluster architectures (prod vs. dev), all approaches tried so far and specific questions about how to safely reclaim disk space without incurring excessive downtime or risk.
1. Production Cluster Topology (v4.0.0)
We have three shards (each as a three-node replica set) plus three mongos routers and three config server replicas. All nodes use WiredTiger.
| Shard Set | Role | Hostname/Node | Service |
|---|---|---|---|
| Shard 1 | Mongos (router) | mongo1 | mongos |
| configReplSet (primary) | mongo1 | mongod (cfg) | |
| rs1 (primary) | mongo1 | mongod (shard) | |
| rs1 (secondary) | mongo1.sec | mongod | |
| rs1 (arbiter) | mongo1.arb | mongod | |
| Shard 2 | mongos (router) | mongo2 | mongos |
| configReplSet (secondary) | mongo2 | mongod (cfg) | |
| rs2 (primary) | mongo2 | mongod (shard) | |
| rs2 (secondary) | mongo2.sec | mongod | |
| rs2 (arbiter) | mongo2.arb | mongod | |
| Shard 3 | mongos (router) | mongo3 | mongos |
| configReplSet (secondary) | mongo3 | mongod (cfg) | |
| rs3 (primary) | mongo3 | mongod (shard) | |
| rs3 (secondary) | mongo3.sec | mongod | |
| rs3 (arbiter) | mongo3.arb | mongod |
- WiredTiger settings: All nodes use default storage.wiredTiger.engineConfig options
- Operating System: Amazon Linux 2 x86_64
2. Development Cluster Topology (v6.0)
To reproduce and investigate in a controlled environment, we built a smaller replica of the prod cluster on MongoDB 6.0. However, our dev nodes are consolidated - each host runs primary, secondary and arbiter for a given shard. The topology looks like this:
| Shard Set | Role | Hostname/Node | Service |
|---|---|---|---|
| Shard 1 | Mongos (router) | mongo1.dev | mongos |
| configReplSet (primary) | mongo1.dev | mongod (cfg) | |
| rs1 (primary) | mongo1.dev | mongod (shard) | |
| rs1 (secondary) | mongo1.dev | mongod | |
| rs1 (arbiter) | mongo1.dev | mongod | |
| Shard 2 | mongos (router) | mongo2.dev | mongos |
| configReplSet (secondary) | mongo2.dev | mongod (cfg) | |
| rs2 (primary) | mongo2.dev | mongod (shard) | |
| rs2 (secondary) | mongo2.dev | mongod | |
| rs2 (arbiter) | mongo2.dev | mongod | |
| Shard 3 | mongos (router) | mongo3.dev | mongos |
| configReplSet (secondary) | mongo3.dev | mongod (cfg) | |
| rs3 (primary) | mongo3.dev | mongod (shard) | |
| rs3 (secondary) | mongo3.dev | mongod | |
| rs3 (arbiter) | mongo3.dev | mongod |
- MongoDB Version: 6.0.18 (WiredTiger)
- OS: Ubuntu 24.04.1 LTS (noble)
3. Goal
- Primary Objective: Drop several inactive databases (~ 300 GB total) from the production cluster and reclaim as much filesystem space as possible on each shard node without performing a full dump/restore or significant downtime.
- Secondary Objective (Dev Testing): Verify which method, if any, can shrink WiredTiger data files on disk so that we can apply the same procedure in production.
4. Approaches Tried in Dev
| Approach | Description & Outcome |
|---|---|
| db.dropDatabase() | Dropping inactive databases in both dev and prod successfully removes collections and indexes from Mongo but _data files remain large on disk. No immediate OS-level space reclamation |
| compact command | Compact command works at collection level. |
We have dropped the db directly|
|repair command|Operation rebuilds all collections and indexes into new .wt files, but final directory size remains roughly the same|
|Manually delete .wt files in dbPath|Wired Tiger stores the data with Collection1.wt, Collection2.wt ….and index1.wt, index2.wt …|
|Add a brand-new Secondary and allow resync|1. Add a clean Secondary to each shard’s replicaset
-
Let it replicate only the remaining (non-dropped) databases/collections
-
Once initial sync is complete, step down the old Primary, make the new node Primary, and rebuild/delete the old data files
Drawbacks
-
Extra hardware/network cost
-
Data size is very large; resync takes an unacceptably long time
-
Risk of data divergence if not all collections are strictly identical|
|mongodump + drop data files + mongorestore|1. mongodump entire production data (all shards) -
Stop all mongod processes
-
Delete all files under dbPath
-
mongorestore to load back only active databases
Drawbacks
-
Creating dumps of hundreds of GBs sometimes results in corrupted dump files (especially for large collections)
-
Requires significant downtime (all shards must be offline)
-
Risk of missing opLog entries or replication lag
-
Restoring hundreds of GB takes days, which we cannot afford|
5. Observations
- In both v4.0.0 (prod) and v6.0.9 (dev), WiredTiger marks deleted or compacted blocks as “free internally” but never shrinks the on-disk .wt file size for a given collection or index.
- compact reclaims space inside the WiredTiger file (so new inserts reuse space), but does not reduce the file’s footprint on the filesystem.
- repairDatabase rebuilds the storage files but in our tests, the resulting file sizes (as reported by du -h) remain as large as the original data files. This suggests that WiredTiger still allocates the same extents on disk even if most pages are empty.
- We confirmed that linux system is not holding deleted blocks (i.e., lsof | grep deleted is not showing any .wt files). The space truly seems “stuck” inside the WiredTiger.wt file segments
6. Questions & Requested Guidance
- Is it expected that WiredTiger never shrinks data files on disk even after compact or repairDatabase?
- If a file‐level shrink is not possible without a dump/restore, are there any alternative best practices?
- Are there any MongoDB tools or scripts to manually defragment .wt and return extents?
- If dump/restore is the only surefire way to fully reclaim space, how can we minimize downtime and mitigate corruption risk for large datasets?
7. Additional Environment Details
- Available Free Space on Production (before deletion):
| Shard | Server | Size | Used Space | Free Space |
|---|---|---|---|---|
| Shard1 | mongo1 | 747.23 GB | 512.86 GB | 234.37 GB |
| mongo1.sec | 747.23 GB | 680.03 GB | 67.20 GB | |
| Shard2 | mongo2 | 840.66 GB | 483.34 GB | 357.32 GB |
| mongo2.sec | 466.94 GB | 450.14 GB | 16.80 GB | |
| Shard3 | mongo3 | 1.09 TB | 429.48 GB | 691.47 GB |
| mongo3.sec | 653.80 GB | 407.68 GB | 246.13 GB |
- Maximum Acceptable Downtime:
- We can take up to 2 hours of planned maintenance on each shard’s primary as long as secondaries remain available for reads
8. What we need
- Definitive confirmation on whether WiredTiger can ever shrink its on‐disk .wt files once they grow.
- Recommendations on the safest and least‐disruptive procedure to reclaim ~300 GB from a production sharded cluster (each shard primary must drop inactive DBs and free space).
- Configuration tweaks (if any) to allow compact or repairDatabase to shrink files including any hidden wt or Linux‐level commands.