Unable to Reclaim Disk Space after Dropping Databases on Sharded Cluster (WiredTiger, v4.0.0)

Anurag_Saini1 · June 10, 2025, 11:35am

We maintain a production MongoDB sharded cluster (version 4.0.0) with WiredTiger as the storage engine. We need to drop several inactive databases (totaling ~300 GB) from production, but after deletion (and even after running compact and repairDatabase), the underlying filesystem does not release the freed space back to the OS.

To understand the behavior and test various remedies, we replicated the entire cluster topology in a development environment running MongoDB 6.0. However, deleting data there also did not free filesystem space. We have outlined below our cluster architectures (prod vs. dev), all approaches tried so far and specific questions about how to safely reclaim disk space without incurring excessive downtime or risk.

1. Production Cluster Topology (v4.0.0)

We have three shards (each as a three-node replica set) plus three mongos routers and three config server replicas. All nodes use WiredTiger.

Shard Set	Role	Hostname/Node	Service
Shard 1	Mongos (router)	mongo1	mongos
	configReplSet (primary)	mongo1	mongod (cfg)
	rs1 (primary)	mongo1	mongod (shard)
	rs1 (secondary)	mongo1.sec	mongod
	rs1 (arbiter)	mongo1.arb	mongod
Shard 2	mongos (router)	mongo2	mongos
	configReplSet (secondary)	mongo2	mongod (cfg)
	rs2 (primary)	mongo2	mongod (shard)
	rs2 (secondary)	mongo2.sec	mongod
	rs2 (arbiter)	mongo2.arb	mongod
Shard 3	mongos (router)	mongo3	mongos
	configReplSet (secondary)	mongo3	mongod (cfg)
	rs3 (primary)	mongo3	mongod (shard)
	rs3 (secondary)	mongo3.sec	mongod
	rs3 (arbiter)	mongo3.arb	mongod

WiredTiger settings: All nodes use default storage.wiredTiger.engineConfig options
Operating System: Amazon Linux 2 x86_64

2. Development Cluster Topology (v6.0)

To reproduce and investigate in a controlled environment, we built a smaller replica of the prod cluster on MongoDB 6.0. However, our dev nodes are consolidated - each host runs primary, secondary and arbiter for a given shard. The topology looks like this:

Shard Set	Role	Hostname/Node	Service
Shard 1	Mongos (router)	mongo1.dev	mongos
	configReplSet (primary)	mongo1.dev	mongod (cfg)
	rs1 (primary)	mongo1.dev	mongod (shard)
	rs1 (secondary)	mongo1.dev	mongod
	rs1 (arbiter)	mongo1.dev	mongod
Shard 2	mongos (router)	mongo2.dev	mongos
	configReplSet (secondary)	mongo2.dev	mongod (cfg)
	rs2 (primary)	mongo2.dev	mongod (shard)
	rs2 (secondary)	mongo2.dev	mongod
	rs2 (arbiter)	mongo2.dev	mongod
Shard 3	mongos (router)	mongo3.dev	mongos
	configReplSet (secondary)	mongo3.dev	mongod (cfg)
	rs3 (primary)	mongo3.dev	mongod (shard)
	rs3 (secondary)	mongo3.dev	mongod
	rs3 (arbiter)	mongo3.dev	mongod

MongoDB Version: 6.0.18 (WiredTiger)
OS: Ubuntu 24.04.1 LTS (noble)

3. Goal

Primary Objective: Drop several inactive databases (~ 300 GB total) from the production cluster and reclaim as much filesystem space as possible on each shard node without performing a full dump/restore or significant downtime.
Secondary Objective (Dev Testing): Verify which method, if any, can shrink WiredTiger data files on disk so that we can apply the same procedure in production.

4. Approaches Tried in Dev

Approach	Description & Outcome
db.dropDatabase()	Dropping inactive databases in both dev and prod successfully removes collections and indexes from Mongo but _data files remain large on disk. No immediate OS-level space reclamation
compact command	Compact command works at collection level.

We have dropped the db directly|
|repair command|Operation rebuilds all collections and indexes into new .wt files, but final directory size remains roughly the same|
|Manually delete .wt files in dbPath|Wired Tiger stores the data with Collection1.wt, Collection2.wt ….and index1.wt, index2.wt …|
|Add a brand-new Secondary and allow resync|1. Add a clean Secondary to each shard’s replicaset

Let it replicate only the remaining (non-dropped) databases/collections
Once initial sync is complete, step down the old Primary, make the new node Primary, and rebuild/delete the old data files

Drawbacks

Extra hardware/network cost
Data size is very large; resync takes an unacceptably long time
Risk of data divergence if not all collections are strictly identical|
|mongodump + drop data files + mongorestore|1. mongodump entire production data (all shards)
Stop all mongod processes
Delete all files under dbPath
mongorestore to load back only active databases

Drawbacks

Creating dumps of hundreds of GBs sometimes results in corrupted dump files (especially for large collections)
Requires significant downtime (all shards must be offline)
Risk of missing opLog entries or replication lag
Restoring hundreds of GB takes days, which we cannot afford|

5. Observations

In both v4.0.0 (prod) and v6.0.9 (dev), WiredTiger marks deleted or compacted blocks as “free internally” but never shrinks the on-disk .wt file size for a given collection or index.
compact reclaims space inside the WiredTiger file (so new inserts reuse space), but does not reduce the file’s footprint on the filesystem.
repairDatabase rebuilds the storage files but in our tests, the resulting file sizes (as reported by du -h) remain as large as the original data files. This suggests that WiredTiger still allocates the same extents on disk even if most pages are empty.
We confirmed that linux system is not holding deleted blocks (i.e., lsof | grep deleted is not showing any .wt files). The space truly seems “stuck” inside the WiredTiger.wt file segments

6. Questions & Requested Guidance

Is it expected that WiredTiger never shrinks data files on disk even after compact or repairDatabase?
If a file‐level shrink is not possible without a dump/restore, are there any alternative best practices?
Are there any MongoDB tools or scripts to manually defragment .wt and return extents?
If dump/restore is the only surefire way to fully reclaim space, how can we minimize downtime and mitigate corruption risk for large datasets?

7. Additional Environment Details

Available Free Space on Production (before deletion):

Shard	Server	Size	Used Space	Free Space
Shard1	mongo1	747.23 GB	512.86 GB	234.37 GB
	mongo1.sec	747.23 GB	680.03 GB	67.20 GB
Shard2	mongo2	840.66 GB	483.34 GB	357.32 GB
	mongo2.sec	466.94 GB	450.14 GB	16.80 GB
Shard3	mongo3	1.09 TB	429.48 GB	691.47 GB
	mongo3.sec	653.80 GB	407.68 GB	246.13 GB

Maximum Acceptable Downtime:
- We can take up to 2 hours of planned maintenance on each shard’s primary as long as secondaries remain available for reads

8. What we need

Definitive confirmation on whether WiredTiger can ever shrink its on‐disk .wt files once they grow.
Recommendations on the safest and least‐disruptive procedure to reclaim ~300 GB from a production sharded cluster (each shard primary must drop inactive DBs and free space).
Configuration tweaks (if any) to allow compact or repairDatabase to shrink files including any hidden wt or Linux‐level commands.

Anurag_Saini1 · June 10, 2025, 2:52pm

Adding Approaches Tried in Dev section again as it was not formatted properly in my initial request

Approach	Description & Outcome
db.dropDatabase()	Dropping inactive databases in both dev and prod successfully removes collections and indexes from Mongo but _data files remain large on disk. No immediate OS-level space reclamation
compact command	Compact command works at collection level. We have dropped the db directly
repair command	Operation rebuilds all collections and indexes into new .wt files, but final directory size remains roughly the same
Manually delete .wt files in dbPath	Wired Tiger stores the data with Collection1.wt, Collection2.wt ….and index1.wt, index2.wt …
Add a brand-new Secondary and allow resync	Add a clean Secondary to each shard’s replicaset. Let it replicate only the remaining (non-dropped) databases/collections. Once initial sync is complete, step down the old Primary, make the new node Primary, and rebuild/delete the old data files. Drawbacks - Extra hardware/network cost. Data size is very large; resync takes an unacceptably long time. Risk of data divergence if not all collections are strictly identical
mongodump + drop data files + mongorestore	mongodump entire production data (all shards). Stop all mongod processes. Delete all files under dbPath. mongorestore to load back only active databases. Drawbacks - Creating dumps of hundreds of GBs sometimes results in corrupted dump files (especially for large collections). Requires significant downtime (all shards must be offline). Risk of missing opLog entries or replication lag. Restoring hundreds of GB takes days, which we cannot afford