5.0 zstd compression level is not working as expected

I have compared same data restoration to MongoDB 4.2, 5.0 (with compression level 6-default, 10, 15 and 22), but i found no data compression is found:
I have set zstd and block compreser level as expaling in config files as:

Anything I am missing here?

Results of same data restoration are:
Mongo 4.2: 7415377920 (Bytes)
Mongo 5.0 -level 22: 7418728448 (Bytes)
Mongo 5.0 -level 6: 7684075520 (Bytes)
Mongo 5.0 -Level 15: 7232811008 (Bytes)

Hi @Aayushi_Mangal

Compression performance depends very much on the documents. If the document contains a random string pattern, those are much less compressible compared to documents containing textual paragraph, for example.

Could you provide the example document that you tried with? And have you tried this experiment with different document patterns? It’ll be great if you can provide information and scripts to reproduce your tests.

As an aside, I usually use a tool like mgeneratejs to generate dummy example documents in large numbers. For example, I can easily create gigabytes of documents following a pattern using this.

Best regards
Kevin

2 Likes

Hi @kevinadi ,

Thank you for your response, please find below details to reproduce. Also if you can share any document or test that shows how these compression level worked, for our case it did nothing.
case 1:

  1. From MongoDB 4.2 we dump around 25 GB of data using mongodump.
  2. launch multiple mongod for 4.2, 5.0 (22, 10, 15, 6 – different compression level )
  3. restore the same data to all these versions to check what compression we are getting.

case2:

  1. we inserted bulk fresh dummy data in all these mongo, but found no difference in compresion.
  2. Sample document for this dummy data is:
{
    "_id" : ObjectId("62cc6f86b504c0604570bf9e"),
    "Traceid" : 1.0,
    "BillNO" : "Trace1",
    "CreatedDate" : ISODate("2020-08-27T21:04:35.967Z"),
    "CustID" : 11.0,
    "SystemType" : "Card",
    "DisplayID" : "123",
    "TraceNo" : "12231",
    "ManyDeliveryGroupID" : "123AS45",
    "traceSource" : "",
    "AdminBy" : "admin",
    "DeliveryNumber" : "121",
    "OriginType" : "abcdefg",
    "ProdInfo" : {
        "Vol" : {
            "Wt" : 55.555,
            "Ut" : null
        },
        "Size" : {
            "Wd" : null,
            "Len" : null,
            "Ht" : null,
            "Ut" : null
        }
    },
    "ReatInfo" : [ 
        {
            "Reat" : "Test",
            "SellerAddress" : "Test,Test",
            "ReatNo" : 8888.0,
            "ReatBillN" : "5655656",
            "ReatDate" : ISODate("2019-09-21T14:17:46.625Z"),
            "BillPri" : 101.0,
            "InvNo" : null,
            "GainSrN" : "2455",
            "ReatSrN" : "4554"
        }
    ],
    "SubSNo" : "12323",
    "ProvideType" : "JJJ",
    "Amount" : 45.0,
    "TraceType" : "",
    "CollectDetails" : {
        "Addr" : [ 
            {
                "Type" : "Sec",
                "Name" : "ABC",
                "Address" : "123,XYZ",
                "City" : "XYZ",
                "State" : "AB"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "",
                "Mob" : "1234567890"
            }
        ],
        "CollectDate" : null,
        "CollectTime" : {
            "Src" : null,
            "Dest" : null
        },
        "IsCollected" : null,
        "CollectCode" : "AB123",
        "Long" : null,
        "Lat" : null,
        "Loc" : null
    },
    "TraceDelivery" : "",
    "TraceParameter" : "",
    "TeacePrice" : 146.0,
    "MentionPrice" : 223.0,
    "ItemPrice" : 8290.0,
    "Comment" : "Valide Trace Data",
    "TransactionType" : "Card",
    "DisctinctID" : "",
    "CollectType" : "Vendor",
    "PCollectCode" : "12333",
    "DestDetails" : {
        "Addresses" : [ 
            {
                "Cate" : "PPP",
                "Name" : "dsdwidm",
                "Address" : "LMN",
                "City" : "LMN",
                "State" : "AB"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "",
                "Mob" : "2234566078"
            }
        ],
        "Submit" : ISODate("2020-10-31T19:33:14.892Z"),
        "SubTime" : {
            "Src" : null,
            "Dest" : null
        },
        "Long" : null,
        "Lat" : null,
        "Loc" : null
    },
    "RetnInfo" : {
        "Addr" : [ 
            {
                "Cate" : "Sec",
                "Name" : "tyty",
                "Address" : "dfdfd",
                "City" : "dfdf",
                "State" : "AA"
            }
        ],
        "ConnectInfo" : [ 
            {
                "Cate" : "PPP",
                "MoB" : "123456777"
            }
        ]
    },
    "TraceRNo" : "",
    "TraceDevCli" : "",
    "TeaceGID" : "",
    "TraceOAmt" : "",
    "Cust" : ""
}

1 Like

HI @Aayushi_Mangal

I did a quick test using ~25GB of data derived from the example document you provided.

This is the output of db.test.stats() of the collection using the standard snappy compression:

  ns: 'test.test',
  size: Long("27070806709"),
  count: 17166000,
  avgObjSize: 1577,
  storageSize: Long("10205900800"),

and this is the output of db.testzstd.stats() of the collection configured to use zstd:

  ns: 'test.testzstd',
  size: Long("27070806709"),
  count: 17166000,
  avgObjSize: 1577,
  storageSize: Long("6124052480"),

So the snappy-compressed collection uses about 9.5GB of storage, and the zstd-compressed collection (using the standard compression level) uses about 5.7GB. I’m using MongoDB 5.0.9.

So far I think it’s working for me, where zstd clearly shows an advantage.

Could you double check the experiment using the latest MongoDB version? E.g. for 5.0, please use 5.0.9, and for 4.2 please use 4.2.21

Best regards
Kevin

1 Like

Hi @kevinadi ,

Thank you for testing it, but the test case you did does not seems the one I have tested. I am looking for the comparison between zstd itself with different compression level that is available from mongodb 5.0 along with MongoDB 4.2,

My test case referring to this https://www.mongodb.com/docs/manual/release-notes/5.0/#configurable-zstd-compression-level

I did same data restoration in mongodb 4.2, 5.0 (with compression level 6-default, 10, 15 and 22) with ZSTD only.

Hi @Aayushi_Mangal,

Thanks for link reference and detailing the test information you performed :slight_smile:

I had inserted about 500K test documents into a MongoDB version 4.2.21 instance with default compression and then mongorestore the dump of this same data to several test instances with varying compressions:

  • MongoDB version 5.0.10 zstd compression level 22
  • MongoDB version 5.0.10 zstd compression level 10
  • MongoDB version 5.0.10 default compression

The results are here below (for all the below tests please take note of the decreasing storageSize values):

MongoDB version 4.2.21, default compression:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 64327680

MongoDB version 5.0.10 default compression:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 54767616

MongoDB version 5.0.10 zstd compression, level 10:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 27152384

MongoDB version 5.0.10 zstd compression, level 22:

  ns: 'compressdb.compresscoll',
  size: 767500000,
  count: 500000,
  avgObjSize: 1535,
  storageSize: 1257472

For case 2 in your reply within this post:

  1. we inserted bulk fresh dummy data in all these mongo, but found no difference in compresion.

Could you run a db.collection.stats() on each of your test cases / instances and advise the following values for each test instance:

  • storageSize
  • creationString

Regards,
Jason

1 Like

Hello Jason,

Thank you so much for response and reproduction of this case.

I did test again by inserting test documents using this script mongo script for insert 100 million test data · GitHub

Please find details required:

MongoDB version 4.2.12

db.actlog.count()
105727766

“storageSize” : 1676038144

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u”

MongoDB version 5.0.8 —level 6

db.actlog.count()
105727766

“storageSize” : 1682145280

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”

MongoDB version 5.0.8 —level 10

db.actlog.count()
105727766

“storageSize” : 1690370048,

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”

MongoDB version 5.0.8 —level 22

db.actlog.count()
105727766

“storageSize” : 1705689088

“creationString” : “access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=zstd,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,readonly=false,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,tiered_object=false,tiered_storage=(auth_token=,bucket=,bucket_prefix=,cache_directory=,local_retention=300,name=,object_target_size=10M),type=file,value_format=u,verbose=,write_timestamp_usage=none”,

I must be missing something, as “creationString” looks identical for 5.0 version. Please suggest if any parameter needs to check or something i am missing here.