WiredTiger cache and compressed collections

SeventhSon · July 21, 2022, 1:32am

Hi.
Found a statement in docs-“WiredTiger cache uncompressed collections only, compressed collections cached by OS cache”.
So does that mean that if all the our collections compressed (with ZSTD) WiredTiger cache is useless and better to minimize its size to reuse memory for OS disk cache/etc?

kevinadi · July 22, 2022, 2:55am

Hey @SeventhSon welcome to the community!

The term “cache” for WiredTiger means more than cache, actually. It also serves as WiredTiger’s working memory, so it’s not as straightforward.

If MongoDB requests data to WiredTiger, first it checks if the data exists in the cache or not. If not, it will fetch it from disk, where the OS typically will cache disk fetches in what is termed the filesystem cache. Note that WiredTiger & MongoDB have no control over this OS filesystem cache and it can’t really influence what’s being cached there.

In basic terms, the more WiredTiger cache you have, the larger WiredTiger’s working memory is. And (ideally) if your working set is correctly sized for the amount of RAM you have, most of the data you need would either be in WiredTiger’s working memory, or the filesystem cache (last resort before hitting the disk). Disk is often the slowest part in the server, and you want to hit it as few times as possible.

The default WiredTiger cache size of ~50% of RAM was selected as a compromise: you want as much working memory for WiredTiger as needed by your workload, but you also need to save some space in RAM for the filesystem cache, the OS’s requirements, and MongoDB’s requirements (RAM for connections, queries, aggregation, etc. are not part of WiredTiger cache).

Generally, the default WiredTiger cache size works well for most workloads. Honestly I have yet to see a case where a server’s performance can be increased by changing this value. From my experience so far, if there are issues with server performance, usually there are slow (e.g. unindexed) queries, or the server is overwhelmed with work that it needs more hardware.

I hope the explanation make sense

Best regards
Kevin

SeventhSon · July 22, 2022, 8:24am

Hi Kevin,

Thanks for detailed explanation. I mostly want to understand what is the difference between comressed and uncompressed collections from Mongo/WiredTiger engine side.
If collection compressed does Mongo just unzip it into WiredTiger cache and operate the same way as uncompressed? We see clearly high CPU load, obviously compressing/decompressing requires more CPU resources, just want to clarify how Mongo/WiredTiger work with compressed collections.

kevinadi · July 25, 2022, 12:26am

Basically yes. WiredTiger would need to work with uncompressed data in its cache.

Best regards
Kevin

system · August 17, 2022, 9:37am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.