in version 4.4 anybody have a concern or recommendation about “Compact” command,
note : my database exceeded 3.5 TB,
thanks a lot
If you were using a server release of MongoDB earlier than 4.4, I’d definitely have serious concerns about blocking side effects of a
compact operation in production. Removing the blocking behaviour was one of the improvements included in the MongoDB 4.4 release, so that is a positive change from previous releases.
compact will no longer block CRUD operations for the database containing the collection being compacted, there could still be a significant impact on your working set if you are compacting a large collection.
Considerations before you
Before running compaction I would check that this might be useful to do based on:
file bytes available for reusemetric (via
- Likelihood that you won’t be inserting that much data into the collection in the near future
It is normal to have some reusable space for a collection with active updates. Excessive reusable space is typically the result of deleting a large amount of data, but can sometimes be related to your workload or the provenance of your data files.
The outcome of a
compact operation is dependent on the storage contents, so I would draw your attention to the note on Disk Space in the
On WiredTiger, compact attempts to reduce the required storage space for data and indexes in a collection, releasing unneeded disk space to the operating system. The effectiveness of this operation is workload dependent and no disk space may be recovered. This command is useful if you have removed a large amount of data from the collection, and do not plan to replace it.
compact in production
If this is a production environment, I would hope you have a replica set or sharded cluster deployment so you can minimise the operational impact.
If you have many large collections to compact (or want a more likely outcome of freeing up disk space), Re-syncing a Secondary Member of a Replica Set via initial sync will rebuild all of the data files by copying over the data from another member. If
compact doesn’t end up freeing up enough space, this would be the next procedure to run.
If you do decide to run
compact in a production environment, I would minimise the operational impact by:
- Always having a replica set deployment (ideally a minimum of three data-bearing members, no arbiters)
compactoperations on one secondary at a time.
hiddenduring the compact operation so the only competing traffic will be basic replication.
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.