WiredTiger Block Manager differences

fergus · April 9, 2021, 11:05pm

I’ve a 2 shard cluster running, and was wondering why 1 node in the cluster was operating at a significantly high CPU throughput than the other.

For one of my largest collections, I noticed that the values within the block-manager output of collections stats where significantly different:

"block-manager" : {
    "allocations requiring file extension" : 100087,
    "blocks allocated" : 2788802,
    "blocks freed" : 2750269,
    "checkpoint size" : 492033282048,
    "file allocation unit size" : 4096,
    "file bytes available for reuse" : 6119456768,
    "file magic number" : 120897,
    "file major version number" : 1,
    "file size in bytes" : 498166542336,
    "minor version number" : 0
},

vs

"block-manager" : {
    "allocations requiring file extension" : 88776387,
    "blocks allocated" : 2968371230,
    "blocks freed" : 2904117984,
    "checkpoint size" : 592490942464,
    "file allocation unit size" : 4096,
    "file bytes available for reuse" : 2130100224,
    "file magic number" : 120897,
    "file major version number" : 1,
    "file size in bytes" : 594622980096,
    "minor version number" : 0
},

I was trying to find out what these fields mean, the collection is balanced, and the file size in bytes is fairly even. So in that case why is the blocks allocated and freed so different?

I tried to find documentation on the meaning of these values but no luck.

The creation string on both shards for this collection are identical as well:
"creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=false),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",

kevinadi · April 20, 2021, 4:56am

Hi @fergus

The short answer is, each mongod manages its own storage independently of other mongod in the cluster or replica set. Thus it’s not uncommon to see each mongod behaving slightly differently, even though logically they should be identical. This is due to differences in actual conditions within each hardware situation.

For replica sets, they should be close to each other, but for sharded clusters, it’s not that simple since some part of the shard may work harder than others. For example, non-sharded collections live in a database’s Primary Shard (each database is different).

It’s worth mentioning that WiredTiger simply follows the instructions of the associated mongod, so if the block manager is busier in one part of the cluster, it means that the server is also putting more data into disk in that instance.

Best regards,
Kevin