Q. Compact Mechanism

I am studying MongoDB: compact.
As I searched for this and that and studied, I got curious.

  • When db..stats().wiredTiger[“block-manager”][“file bytes available for reuse”] is executed, the size (KB) of the compactable space in the corresponding collection is output.

  • Compact is possible through command db.runCommand({compact:“”, force:true}).
    However, there are cases where the expected results don’t come out.

  • In the case of MongoDB, the Page Write method is used.
    Also, when I do Update, it creates a new page, not overwrite.
    If there are pages that are not used for it, I can compact them.

  • Wired Tiger internally uses B-Tree (root-branch-leaf).
    root and branch have meta data, and these root/branch pages are stored in random locations in the file.
    The more the location of this page is at the back of the file, the less efficient the compact is.

  • To use the most efficient Compact, use the dump/restore method.

Here, I wonder why the efficiency of the compact does not come out as the location of the page is at the back of the file.

ps. The quote may differ from its original meaning because it was translated by turning the translator.

Hi @Kim_Hakseon

It’s mainly because it is assumed that your database would have more data in the future, and not less. The bytes available for reuse number shows how much data is available to be reused in the current data file (typically after some deletions), and WiredTiger will try to reuse this space for new data before extending the file.

To keep the operation simple and efficient, the compact command will try to move data at the end of the file that can fit into any reusable space, then truncate the empty spaces at the end of the file. If this is not possible due to some reason (e.g. not enough reusable space, the reusable spaces cannot accommodate the data, etc.) then the command might not do anything.

The dump & restore method basically sidesteps this operation, and just write into a new, fresh data file, so it will have the most compact storage of your current data.

Best regards
Kevin

Thank you for your kind answer.

But I still don’t understand because I’m not good enough.
I will study more and try to fully understand it.

Thank you once again.:smiley:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.