MongoDB Network Compression: A Win-Win
Rate this tutorial
An under-advertised feature of MongoDB is its ability to compress data between the client and the server. The CRM company Close has a really on how compression reduced their network traffic from about 140 Mbps to 65 Mpbs. As Close notes, with cloud data transfer costs ranging from $0.01 per GB and up, you can get a nice little savings with a simple configuration change.
MongoDB supports the following compressors:
pip3 install python-snappy
pip3 install zstandard
pip3 install faker
The cloud providers notably charge for data egress, so anything that reduces network traffic out is a win.
Let's first run the script without network compression (the default):
You've obviously noticed the reported Megabytes out (188 MB) are more than 18 times our test size of 10 MBs. There are several reasons for this, including other workloads running on the server, data replication to secondary nodes, and the TCP packet being larger than just the data. Focus on the delta between the other tests runs.
The script accepts an optional compression argument, that must be either
zstd. Let's run the test again using
snappy, which is known to be fast, while sacrificing some compression:
snappycompression, our reported bytes out were about
62 MBsfewer. That's a
33%savings. But wait, the
10 MBsof data was read in
10fewer seconds. That's also a
Let's try this again using
zlib, which can achieve better compression, but at the expense of performance.
zlibcompression configured at its maximum compression level, we were able to achieve a
64%reduction in network egress, although it took 4 seconds longer. However, that's still a
19%performance improvement over using no compression at all.
Let's run a final test using
zstd, which is advertised to bring together the speed of
snappywith the compression efficiency of
And sure enough,
zstdlives up to its reputation, achieving
68%percent improvement in compression along with a
55%improvement in performance!
The cloud providers often don't charge us for data ingress. However, given the substantial performance improvements with read workloads, what can be expected from write workloads?
As before, let's run the test without compression:
So it took
15seconds to write
27,778records. Let's run the same test with
There are a couple of options for measuring network traffic. This script is using the
physicalBytesIn, reporting on the delta between the reading at the start and end of the test run. As mentioned previously, our measurements are corrupted by other network traffic occuring on the server, but my tests have shown a consistent improvement when run. Visually, my results achieved appear as follows:
Bottom line, compression reduces network traffic by more than 60%, which is in line with the improvement seen by Close. More importantly, compression also had a dramatic improvement on read performance. That's a Win-Win.