5.0 zstd compression level is not working as expected

Jason_Tran · August 12, 2022, 5:50am

Thanks for providing those details

Firstly, I would like to note my results in my previous post were changed and eventually reached a stage where the storageSize was very close in value for various zstd compression levels (differing by 1-2%). I had noted these values very shortly after importing the data but did not realise the storageSize was growing over a few minutes for some of the higher compression levels set due to some internal WiredTiger processes (e.g. level 22).

From some very basic zstd compression testing, I performed a level 6 vs level 22 compression using the zstd command-line tool of a 1480 byte BSON file outside of MongoDB, which will hopefully mirror what is happening inside MongoDB to some extent (As far as I know, MongoDB compresses each document individually). The compression difference was ~0.8% smaller file size when using level 22 compression compared to level 6:

$ls -l
-rw-r--r--  1 user  staff  1480 12 Aug 14:11 testcoll.bson /// <--- Original
-rw-r--r--  1 user  staff   786 12 Aug 14:11 testcoll6.bson.zst /// <--- zstd compressed level 6
-rw-r--r--  1 user  staff   780 12 Aug 14:11 testcoll22.bson /// <--- zstd compress level 22

Please note that this is a very simple demonstration for a singular compression use case and that the manner WiredTiger utilises zstd has differences from how it was used in the example above.

This may be a case where the lower levels of compression have reached towards lower limit and higher levels of compression cannot compress much further beyond that until the lower limit. The amount of compression generally depends on the type of data as well.

Regards,
Jason