WiredTiger Library Panic on Insert

The_Techromancer · August 31, 2020, 9:41pm

So far, this issue has only ever happened when the number of documents in the shard is greater than 500,000,000. This could just be a coincidence, but I’ve erased the entire DB and re-imported all the data numerous times, and whenever this happens, it seems to take out multiple shards, not just one.

I’m running the nodes in docker.

$ mongod --version
db version v4.2.8
git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1804
    distarch: x86_64
    target_arch: x86_64

It occurs on insert, and leaves the database in an unrepairable state. (running --repair fails with a duplicate key error):

2020-08-31T15:42:29.003+0000 I  -        [initandlisten]   Index Build: inserting keys from external sorter into index: 35096500/835088325 4%
2020-08-31T15:42:33.501+0000 I  STORAGE  [initandlisten] Index builds manager failed: 70273619-2562-4e0e-880e-6fe21c4397a7: myapp.comments: DuplicateKey{ keyPattern: { _id: 1 }, keyValue: { _id: "2b00042f7481c7b056c4b410d28f33cf" } }: E11000 duplicate key error collection: myapp.comments index: _id_ dup key: { _id: "2b00042f7481c7b056c4b410d28f33cf" }
2020-08-31T15:42:33.501+0000 I  STORAGE  [initandlisten] Index build failed: 70273619-2562-4e0e-880e-6fe21c4397a7: myapp.comments ( 8e03ebdc-bbcb-491a-aa49-deec30dde3ad ): DuplicateKey{ keyPattern: { _id: 1 }, keyValue: { _id: "2b00042f7481c7b056c4b410d28f33cf" } }: E11000 duplicate key error collection: myapp.comments index: _id_ dup key: { _id: "2b00042f7481c7b056c4b410d28f33cf" }
2020-08-31T15:42:33.501+0000 F  -        [initandlisten] Fatal assertion 51076 DuplicateKey{ keyPattern: { _id: 1 }, keyValue: { _id: "2b00042f7481c7b056c4b410d28f33cf" } }: E11000 duplicate key error collection: myapp.comments index: _id_ dup key: { _id: "2b00042f7481c7b056c4b410d28f33cf" } at src/mongo/db/index_builds_coordinator.cpp 1035
2020-08-31T15:42:33.501+0000 F  -        [initandlisten] 

***aborting after fassert() failure

At first I thought it was a problem with the storage that was causing the issue, but all the SSDs passed full read/write tests. I eventually ended up switching out the storage anyway just to be sure, but I’m ending up with the same issue.

2020-08-31T14:05:13.615+0000 E  STORAGE  [conn57] WiredTiger error (0) [1598882713:615525][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 135: {98792763392, 24576, 0x2ad0120a}: (chunk 1 of 24): <snipped>  Raw: [1598882713:615525][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 135: {98792763392, 24576, 0x2ad0120a}: (chunk 1 of 24): <snipped>
2020-08-31T14:05:13.619+0000 E  STORAGE  [conn57] WiredTiger error (-31802) [1598882713:619635][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_block_read_off, 292: collection-19--1551888894885693755.wt: fatal read error: WT_ERROR: non-specific WiredTiger error Raw: [1598882713:619635][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_block_read_off, 292: collection-19--1551888894885693755.wt: fatal read error: WT_ERROR: non-specific WiredTiger error                                                                                            2020-08-31T14:05:13.619+0000 E  STORAGE  [conn57] WiredTiger error (-31804) [1598882713:619667][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1598882713:619667][1:0x7f47acb92700], file:collection-19--1551888894885693755.wt, WT_CURSOR.search: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
2020-08-31T14:05:13.619+0000 F  -        [conn57] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414                                                                                                                                                                             2020-08-31T14:05:13.619+0000 F  -        [conn57]

***aborting after fassert() failure
----- BEGIN BACKTRACE -----
<snipped>
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55bb10216631]
mongod(+0x28BDC4C) [0x55bb10215c4c]
mongod(+0x28BDCD6) [0x55bb10215cd6]
libpthread.so.0(+0x12890) [0x7f47c8ceb890]
libc.so.6(gsignal+0xC7) [0x7f47c8926e97]
libc.so.6(abort+0x141) [0x7f47c8928801]
mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55bb0e66c62b]
mongod(+0xA50666) [0x55bb0e3a8666]
mongod(+0xEB7BBB) [0x55bb0e80fbbb]
mongod(__wt_err_func+0x90) [0x55bb0e3b8f28]
mongod(__wt_panic+0x39) [0x55bb0e3b938c]
mongod(+0xA6DC29) [0x55bb0e3c5c29]
mongod(__wt_bm_read+0x13F) [0x55bb0e8ee69f]
mongod(__wt_bt_read+0x92) [0x55bb0e853002]
mongod(+0xF03C43) [0x55bb0e85bc43]
mongod(__wt_page_in_func+0x3DE) [0x55bb0e85ca7e]
mongod(__wt_row_search+0x93D) [0x55bb0e88963d]
mongod(__wt_btcur_search+0x893) [0x55bb0e846253]
mongod(+0xE51955) [0x55bb0e7a9955]
mongod(+0xE0A33D) [0x55bb0e76233d]
mongod(_ZN5mongo31WiredTigerRecordStoreCursorBase9seekExactERKNS_8RecordIdE+0xA6) [0x55bb0e762686]
mongod(_ZN5mongo16WorkingSetCommon5fetchEPNS_16OperationContextEPNS_10WorkingSetEmNS_11unowned_ptrINS_20SeekableRecordCursorEEE+0x90) [0x55bb0f102200]
mongod(_ZN5mongo11IDHackStage6doWorkEPm+0xEA) [0x55bb0f0cf57a]
mongod(_ZN5mongo9PlanStage4workEPm+0x68) [0x55bb0f0dd668]
mongod(_ZN5mongo11UpdateStage6doWorkEPm+0x3DB) [0x55bb0f0ff05b]
mongod(_ZN5mongo11UpsertStage6doWorkEPm+0x7B) [0x55bb0f100feb]
mongod(_ZN5mongo9PlanStage4workEPm+0x68) [0x55bb0f0dd668]
mongod(_ZN5mongo16PlanExecutorImpl12_getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x230) [0x55bb0f124b90]
mongod(_ZN5mongo16PlanExecutorImpl7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4D) [0x55bb0f1252ed]
mongod(_ZN5mongo16PlanExecutorImpl11executePlanEv+0x5D) [0x55bb0f1255ad]
mongod(_ZN5mongo14performUpdatesEPNS_16OperationContextERKNS_9write_ops6UpdateE+0xCD8) [0x55bb0ee2e228]
mongod(+0x14CA456) [0x55bb0ee22456]
mongod(+0x14C8C5D) [0x55bb0ee20c5d]
mongod(+0x118C3E8) [0x55bb0eae43e8]
mongod(+0x118E654) [0x55bb0eae6654]
mongod(_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageERKNS0_5HooksE+0x41A) [0x55bb0eae73ea]
mongod(_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x3C) [0x55bb0ead4f6c]
mongod(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xEC) [0x55bb0eae0cfc]
mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x17F) [0x55bb0eaddfdf]
mongod(+0x1187C6C) [0x55bb0eadfc6c]
mongod(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x182) [0x55bb0f93bce2]
mongod(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x10D) [0x55bb0eadaead]
mongod(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0x753) [0x55bb0eadc723]
mongod(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x316) [0x55bb0eadd2d6]
mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0xDB) [0x55bb0eaddf3b]
mongod(+0x1187C6C) [0x55bb0eadfc6c]
mongod(+0x1FE414B) [0x55bb0f93c14b]
mongod(+0x2639A45) [0x55bb0ff91a45]
mongod(+0x2639AA4) [0x55bb0ff91aa4]
libpthread.so.0(+0x76DB) [0x7f47c8ce06db]
libc.so.6(clone+0x3F) [0x7f47c8a0988f]
-----  END BACKTRACE  -----

My best theory right now is that I’ve accidentally stumbled upon some strange edge case where the contents of a document actually manage to corrupt the entire database. At any rate, hopefully I’m wrong, because I really like mongodb.

Pavel_Duchovny · September 1, 2020, 7:31am

Hi @The_Techromancer,

The WiredTiger error seems to be some sort of corruption on collection file collection-19--1551888894885693755.wt .

Can you clarify how do you clean the data and how do you populate it? Does it always fail on a specific loading point?

I suggest that you figure out if it always happen on this single namespace and which one is to run db.collection.validate(true) on it.

Please use the following script when connected to the relevant instance with a mongo shell (on the buttom replace <FILENAME> with `collection-19–1551888894885693755.wt``:

function findNamespaceForWiredTigerFile( searchFileName ) {
    if( typeof searchFileName != 'string' ) {
        throw new Error( 'searchFileName should be a string' );
    }
    if( searchFileName.length - 3 == searchFileName.indexOf( '.wt' ) ) {
        searchFileName = searchFileName.substring( 0, searchFileName.length - 3 );
    }
    function checkCollectionFile( coll ) {
        var stats = coll.stats().wiredTiger;
        if( !stats ) {
            throw new Error( "no wiredTiger statistics available for " + coll.getFullName() );
        }
        if( !stats.uri.includes( searchFileName ) ) {
            return null;
        }
        return {
            uri: stats.uri,
            type: "collection",
            ns: coll.getFullName()
        };
    }
    function checkIndexFiles( coll ) {
        var tree = coll.stats( { indexDetails: true } ).indexDetails;
        if( !tree ) {
            throw new Error( "no index statistics available for " + coll.getFullName() );
        }
        for( var elem in tree ) {
            if( tree[elem].uri.includes( searchFileName ) ) {
                return {
                    uri: tree[elem].uri,
                    type: "index",
                    ns: coll.getFullName(),
                    key: JSON.parse( tree[elem].metadata.infoObj ).key
                };
            }
        }
        return null;
    }
    function checkCollectionAndIndexFiles( coll ) {
        var obj = checkCollectionFile( coll );
        if( obj ) {
            return obj;
        }
        return checkIndexFiles( coll );
    }
    var checkFunc = checkCollectionAndIndexFiles;
    if( searchFileName.includes( 'index' ) ) {
        checkFunc = checkIndexFiles;
        print( 'filename suggests an index, will check indexes only' );
    }
    else if( searchFileName.includes( 'collection' ) ) {
        checkFunc = checkCollectionFile;
        print( 'filename suggests a collection, will check collections only' );
    }
    else {
        print( 'filename has no hint, will check all collections and indexes' );
    }
    var desc = null;
    db.getMongo().getDBNames().some( function( d ) {
        print( "searching database: " + d );
        var curr_db = db.getSiblingDB( d );
        curr_db.getCollectionNames().some( function( collname ) {
            var coll = curr_db.getCollection( collname );
            desc = checkFunc( coll );
            if( desc ) {
                found = true;
                print( "" );
                print( ">>> found the file in " + desc.ns );
                print( "" );
            }
            return desc;
        } );
        return desc;
    } );
    return desc;
}

findNamespaceForWiredTigerFile('<FILENAME>');

Thanks,
Pavel

The_Techromancer · September 1, 2020, 8:53pm

@Pavel_Duchovny thanks for the quick reply!

I’m currently unable to start mongod since the repair failed due to the duplicate key error.

$ mongod --config /scripts/mongod.conf
2020-09-01T15:23:48.807+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2020-09-01T15:23:48.809+0000 W  ASIO     [main] No TransportLayer configured during NetworkInterface startup
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=100 port=27017 dbpath=/data/db 64-bit host=499c45278f4c
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] db version v4.2.8
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] git version: 43d25964249164d76d5e04dd6cf38f6111e21f5f
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] modules: none
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] build environment:
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten]     distmod: ubuntu1804
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2020-09-01T15:23:48.810+0000 I  CONTROL  [initandlisten] options: { config: "/scripts/mongod.conf", storage: { wiredTiger: { collectionConfig: { blockCompressor: "snappy" }, engineConfig: { cacheSizeGB: 1.0, journalCompressor: "none" } } } }
2020-09-01T15:23:48.814+0000 W  STORAGE  [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2020-09-01T15:23:48.814+0000 F  STORAGE  [initandlisten] An incomplete repair has been detected! This is likely because a repair operation unexpectedly failed before completing. MongoDB will not start up again without --repair.
2020-09-01T15:23:48.814+0000 F  -        [initandlisten] Fatal Assertion 50922 at src/mongo/db/storage/storage_engine_init.cpp 85
2020-09-01T15:23:48.814+0000 F  -        [initandlisten] 

***aborting after fassert() failure

I populated all the data using the pymongo library. The only data types that are loaded are Python strings (which I’m assuming are eventually encoded into UTF-8). I can’t guarantee the exact content of the strings except that they should all reside within the UTF character set. Are there any specific types of characters I should avoid? (e.g. non-standard whitespace, etc.?)

Pavel_Duchovny · September 2, 2020, 4:53am

Hi @The_Techromancer,

This seems to be an attempt from after the node was unseccessfully repaired.

Is this shard a replica or a standalone?

Can you bring the replica set to the initial failure? Have you backed up the dbPath before repair?

Does after the initial failure instance can be started?

Additionally, what is the connection string and write Concern settings? Is the corrupted collection sharded?

Best
Pavel

The_Techromancer · September 2, 2020, 5:21pm

@Pavel_Duchovny of course I have a backup

Does it always fail on a specific loading point?
This definitely is possible, although the sheer volume and parallelization of the import process makes it virtually impossible to narrow this down to a single document or group of documents. To import all the data in a single thread would take months if not years, which isn’t mongo’s fault.

Does after the initial failure instance can be started?
Yes it can be started, but the panic happens again whenever one of the corrupted areas is accessed.

Additionally, what is the connection string and write Concern settings? Is the corrupted collection sharded?
Yes, the corrupted collection is sharded (and when the corruption happens, it does infect other shards)

client = pymongo.MongoClient(
    '127.0.0.1',
    27017,
    username=mongo_user,
    password=mongo_pass
)

db = client['myapp']
comments = db.comments.with_options(
    write_concern=pymongo.write_concern.WriteConcern(
        w=0,
        j=False,
        fsync=False
    ),
)

Here’s the result of that command:

> findNamespaceForWiredTigerFile('collection-19--1551888894885693755.wt');
filename suggests a collection, will check collections only
searching database: admin
searching database: config
2020-09-02T15:14:12.649+0000 I  SHARDING [conn1] Marking collection config.cache.chunks.config.system.sessions as collection version: <unsharded>
2020-09-02T15:14:12.657+0000 I  SHARDING [conn1] Marking collection config.cache.chunks.myapp.comments as collection version: <unsharded>
2020-09-02T15:14:12.663+0000 I  SHARDING [conn1] Marking collection config.cache.collections as collection version: <unsharded>
2020-09-02T15:14:12.666+0000 I  SHARDING [conn1] Marking collection config.cache.databases as collection version: <unsharded>
searching database: myapp
2020-09-02T15:14:12.674+0000 I  SHARDING [conn1] Marking collection myapp.comments as collection version: <unsharded>
2020-09-02T15:14:12.811+0000 I  COMMAND  [conn1] command myapp.comments appName: "MongoDB Shell" command: collStats { collStats: "comments", scale: undefined, lsid: { id: UUID("df8deb1f-cda8-4325-9646-d3771d6bc5c0") }, $db: "myapp" } numYields:0 reslen:14080 locks:{ ReplicationStateTransition: { acquireCount: { w: 1 } }, Global: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } }, Mutex: { acquireCount: { r: 1 } } } storage:{ data: { bytesRead: 178 } } protocol:op_msg 137ms

>>> found the file in myapp.comments

{
        "uri" : "statistics:table:collection-19--1551888894885693755",
        "type" : "collection",
        "ns" : "myapp.comments"
}

Pavel_Duchovny · September 2, 2020, 6:35pm

Hi @The_Techromancer,

Can you validate myapp.comments collection?

use myapp
db.comments.validate(true);

This should output what is corrupted.

Best
Pavel

The_Techromancer · September 2, 2020, 8:48pm

Awesome, TIL about the validate() function! Hopefully I can use this to narrow down the source of the corrupted data.

2020-09-02T18:52:03.690+0000 I  STORAGE  [conn1] WiredTiger progress WT_SESSION.verify 338200
2020-09-02T18:52:03.699+0000 I  STORAGE  [conn1] WiredTiger progress WT_SESSION.verify 338300
2020-09-02T18:52:03.708+0000 I  STORAGE  [conn1] WiredTiger progress WT_SESSION.verify 338400
2020-09-02T18:52:03.717+0000 I  STORAGE  [conn1] WiredTiger progress WT_SESSION.verify 338500
2020-09-02T18:52:03.722+0000 E  STORAGE  [conn1] WiredTiger error (0) [1599072723:722250][12:0x7f8468cf1700], file:collection-19--1551888894885693755.wt, WT_SESSION.verify: __wt_block_read_off, 274: collection-19--1551888894885693755.wt: read checksum error for 24576B block at offset 98792763392: calculated block checksum  doesn't match expected checksum
2020-09-02T18:52:03.722+0000 E  STORAGE  [conn1] WiredTiger error (0) [1599072723:722388][12:0x7f8468cf1700], file:collection-19--1551888894885693755.wt, WT_SESSION.verify: __wt_bm_corrupt_dump, 135: {98792763392, 24576, 0x2ad0120a}: (chunk 1 of 24): 88 f3 cb 8e bb 96 19 ed c7 a7 5b 08 63 eb 32 c9 0c d8 19 00 d2 9b f7 34 02 3c 53 45 36 c4 f2 77 93 29 33 e8 27 f1 7f f2 1a da c3 da 4a c6 2f e9 1f ef 76 6e bd 86 77 e4 0e fe a9 3b c0 c4 f5 23 5d 33 97 34 34 51 85 df 18 20 4c 55 42 40 15 52 16 c0 9c d0 d5 84 18 6e 6d 0e 62 3f 60 8c 6f 61 af a8 11 42 d3 f3 1d e9 61 48 a8 d3 40 80 f0 63 45 02 66 32 3c a5 07 c3 5a c9 f7 d2 7a 02 b0 bf 2a 31 e5 53 46 a1 de d7 b4 f6 1f 7a ac cb 5c 0f ee 0f 9d 21 fd d2 f6 c6 a3 14 2b 34 07 11 66 b6 64 f8 4e be 7c 56 4b 9d 0f 73 06 51 64 94 63 ba 49 da 0f 93 2e 64 89 78 97 88 61 57 9b 61 ee 61 2d 95 5a 95 09 38 da 26 5b d1 55 e9 fd 16 aa 1a 9e d1 40 7f 5d 22 c5 8f 5a 07 89 26 13 e3 b4 c4 06 37 61 20 bd 51 30 a0 e5 bc 8b eb c7 b2 57 a3 ab 9b 8f dd 49 33 0b 61 05 ea fa fe b1 a0 27 61 1f 4c 0e f3 8c 09 81 98 98 17 d9 19 e2 36 c6 18 33 ca 81 ba 30 83 46 e4 b2 ec d8 22 5c 0d bd ba 0b 56 d3 77 b8 4e 4d 9e 1d a7 6a 37 74 2d 6a 0e cb 57 30 14 ef e8 cd dc a3 b8 1d dc 48 bb 81 c9 65 a7 80 ce 4f 4b 8d 14 35 90 bb 13 4a 35 27 ab 28 42 17 99 68 d2 3d 9f 74 bc 02 f5 6b d6 42 3c 42 2a 97 7d 2a 82 5b e4 93 6a 5e bf 07 ec bd 96 b0 3b b4 56 5b 21 1f 70 82 e5 00 be 73 91 24 42 db 95 37 a9 a3 f8 ea 77 ac 34 dc d0 c0 57 88 f7 64 83 af ff 1d 41 3f 7f fa db f9 53 07 a6 a7 3b f1 42 89 b5 a2 7e cf 6b df cc 8f df 30 b0 75 e1 cc 0e 50 5d ea 02 93 28 f7 7a 71 d2 4c c6 78 d2 bb c0 d4 41 04 d4 b4 b3 02 3f 79 a4 0f d2 e7 b0 69 5a 5f cf 82 aa 3c f8 9c 4a 53 9d d2 55 24 e7 c6 d6 dc e9 4f 7e fd fd df 61 e8 21 15 52 be d4 73 50 a8 f3 0b e6 a3 10 be 6a 7c ad 43 33 63 9f 20 20 20 20 20 20 48 45 4c 50 20 49 27 56 45 20 42 45 45 4e 20 43 4f 52 52 55 50 54 45 44 20 42 59 20 54 48 45 20 44 41 52 4b 20 4f 4e 45 20 20 20 20 20 20 45 58 49 53 54 45 4e 43 45 20 49 53 20 50 41 49 4e 20 20 20 20 20 20 50 4c 45 41 53 45 20 45 4e 44 20 4d 59 20 53 55 46 46 45 52 49 4e 47 20 20 20 20 20 20 e9 38 4a ef 92 9c 1d 74 74 88 ca e6 3a 37 da 23 e8 30 74 a4 8b f6 03 d1 d3 3e 31 51 9d 50 d9 10 79 11 97 e2 e0 e8 6d 43 fe fc 35 7b b8 3d d6 0f 1f 93 27 e9 91 14 92 b5 7e 56 de 11 ad 0e d5 ae 63 88 90 4f fb 34 ae 94 ae 4b 74 ab 5b b6 5a 1a bb 29 bd b4 51 46 25 2c e0 2e c9 e3 5c bb d2 ea bf b6 f5 db f2 6e 6a 4d f6 fa 7d 45 09 53 06 50 5c c2 a7 69 60 f7 0a b0 91 8f 1a 84 5b e3 f0 1f eb b1 82 fc 64 9b 42 b3 5d a0 95 8b 88 f5 dc 73 7b 8f e7 96 39 c1 45 15 c6 f8 75 fe fa 06 c4 ed b7 1c 24 d7 d3 f9 34 32 28 7e b4 90 28 8a 65 a5 d3 d5 35 c6 11 4a bd d1 b7 78 21 0b 98 ab 06 1d 52 3a 13 48 4b 62 6b 64 f6 d1 b6 14 f1 ce 89 c0 b6 48 c2 b2 d2 dd 2a 03 c5 e2 36 9d b2 0e 1a 58 a2 fb 86 dd e8 a6 c2 97 c0 5d d4 2a f2 f8 f5 22 03 bd 09 0f 50 9c f7 0d c1 ba 5f ab 61 b6 ad 6a cf 4d d4 5a fe f0 98 73 08 d4 b6 96 3f c1 2b 8c 78 14 94 31 b0 9f b8 0b c1 4b 30 3f bb a3 9c c2 3c e4 c6 b5 3f b7 f0 70 99 84 1a 34 cf ee af 1a be eb c2 40 9d 68 81 8d 35 18 ff b7 95 c4 88 7a e2 74 04 89 5f 45 eb 32 24 2f ae ce 72 bc 62 0a e2 d5 bd e2 ac 60 d2 c5 54 53 b6 63 61 fc f4 f2 d2 3d d9 10 81 9d 9f 4e ec f6 f2 54 8e a8 cb da 4c af 26 2d f9 9e ff f5 3a 89 2a 3d e0 dc 3e 20 f8 7c f4 75 a5 88 a1 24 15 9d 91 a0 c4 cd ce f7 f9 83 02 41 17 2b 73 94

Thanks very much for your help @Pavel_Duchovny, I’ll do some digging and report back.

Pavel_Duchovny · September 3, 2020, 5:10am

Hi @The_Techromancer,

On 99% of the case checksum errors are related to HW issues either with disk itself or disk faulty/unreliable configuration. It can also be a result of unexpected memory crash or faulty memory as well.

Please make sure you compliant with our Production Notes and Operation Checklist.

If all of your HW is good and configured correctly perhaps consider filing a SERVER ticket on https://mongodb.jira.com.

Best
Pavel

rajesh_matta · July 6, 2021, 12:41pm

What block compressor you were using at that time.
I am having the same problem again and again. Here are the logs and this error reveals itself when I use a delete command.
{“t”:{"$date":“2021-07-04T10:19:59.449+08:00”},“s”:“I”, “c”:“STORAGE”, “id”:22430, “ctx”:“WTCheckpointThread”,“msg”:“WiredTiger message”,“attr”:{“message”:"[1625365199:449289][99134:0x7f02d02cc700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] Checkpoint ran for 625 seconds and wrote: 42753 pages (1408 MB)"}}
{“t”:{"$date":“2021-07-04T10:22:24.607+08:00”},“s”:“I”, “c”:“STORAGE”, “id”:22430, “ctx”:“WTCheckpointThread”,“msg”:“WiredTiger message”,“attr”:{“message”:"[1625365344:607899][99134:0x7f02d02cc700], file:data_202106/index-2–3267180256554870217.wt, WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] Checkpoint has been running for 85 seconds and wrote: 5000 pages (162 MB)"}}
{“t”:{"$date":“2021-07-04T10:23:36.269+08:00”},“s”:“I”, “c”:“STORAGE”, “id”:22430, “ctx”:“WTCheckpointThread”,“msg”:“WiredTiger message”,“attr”:{“message”:"[1625365416:269528][99134:0x7f02d02cc700], file:data_202106/index-2–3267180256554870217.wt, WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] Checkpoint has been running for 156 seconds and wrote: 10000 pages (319 MB)"}}
{“t”:{"$date":“2021-07-04T10:24:01.541+08:00”},“s”:“E”, “c”:“STORAGE”, “id”:22435, “ctx”:“conn1”,“msg”:“WiredTiger error”,“attr”:{“error”:0,“message”:"[1625365441:537220][99134:0x7f02c82bc700], file:data_202106/index-2–3267180256554870217.wt, WT_CURSOR.remove: __wt_block_read_off, 322: data_202106/index-2–3267180256554870217.wt: read checksum error for 16384B block at offset 2616250368: calculated block checksum doesn’t match expected checksum"}}
{“t”:{"$date":“2021-07-04T10:24:01.541+08:00”},“s”:“E”, “c”:“STORAGE”, “id”:22435, “ctx”:“conn1”,“msg”:“WiredTiger error”,“attr”:{“error”:0,“message”:"[1625365441:541785][99134:0x7f02c82bc700], file:data_202106/index-2–3267180256554870217.wt, WT_CURSOR.remove: __wt_bm_corrupt_dump, 139: {0: 2616250368, 16384, 0x24a4104c}: (chunk 1 of 16): 00 00 00 00 00 00 00 00 9a 08 79 00 00 00 00 00 19 32 00 00 dc 03 00 00 07 02 00 01 00 40 00 00 00 00 00 00 01 00 00 00 75 3c 61 6d 65 72 69 74 65 63 68 2e 6e 65 74 00 3c 61 6c 67 72 61 79 00 04 60 77 23 ad c3 12 19 b7 a7 8f 03 22 12 68 00 04 60 89 8d 08 db 0a 18 ab bb 0a 18 b1 03 26 13 30 30 00 04 60 af e1 45 6b 42 13 61 67 69 2e 6c 65 74 74 65 00 04 60 b2 1e 26 8b 2a 14 72 72 69 73 00 04 41 ed c5 d2 2a 18 63 70 61 00 04 60 c6 95 59 d3 3a 13 63 68 65 65 74 61 68 00 04 60 8b 73 4f 1b 26 13 67 68 00 04 60 7c 2c d0 5b 06 1b 7b 06 1b 83 2a 13 6c 6d 68 00 04 60 8a f8 c4 eb 3e 13 6f 6d 6d 65 64 69 65 75 00 04 60 89 fa 19 73 3a 12 69 2e 6a 6f 6e 65 73 00 04 60 9e 29 19 d3 42 14 6f 7a 6b 61 79 61 61 6c 70 00 04 60 85 db 0a ab 2a 13 32 38 30 00 04 60 8b 68 ea 33 26 14 72 6b 00 04 60 af 48 aa 7b 2e 13 35 35 6f 6e 00 04 60 7d a3 02 db 42 13 5f 62 61 62 61 31 39 36 32 00 04 60 75 90 2f 33 36 14 63 61 76 61 6c 32 00 04 60 87 d9 10 a3 0a 1f d2 d3 3a 14 64 6f 63 69 6f 75 73 00 04 60 9b 36 a7 03 26 14 76 68 00 04 60 b2 50 1c bb 3a 13 61 63 6f 75 72 61 73 00 04 60 ab 7b 07 fb 2e 14 6c 69 6f 73 00 04 60 74 ef f4 33 2a 14 6e 6e 73 00 04 60 af 0a 17 eb 32 14 73 31 34 32 39 00 04 60 8a 4c 90 c3 26 15 6f 6d 00 04 60 9c 0d e8 23 2e 15 73 69 6d 73 00 04 60 b2 35 9b 8b 3a 13 62 61 6c 6d 62 72 62 00 04 60 78 28 b8 db 2a 15 6e 69 74 00 04 60 7f 77 41 2b 32 15 72 72 61 74 74 00 04 60 a5 3f 46 bb 26 14 62 35 00 04 60 8f ff bc 7b 3a 14 65 72 61 74 6f 72 65 00 04 60 89 9b 93 a3 26 16 74 79 00 04 60 8a 68 8a 8b 2e 14 69 62 69 38 00 04 60 ab 3a 04 a3 26 13 63 32 00 04 60 78 20 f6 f3 42 14 61 5f 6c 69 6e 64 73 61 79 00 04 60 84 83 63 23 12 20 96 e5 35 0b 26 15 72 69 00 04 60 88 d6 09 db 22 16 72 00 04 60 94 34 a9 33 06 1d 3b 22 15 74 00 04 60 75 12 53 a3 26 16 31 31 00 04 60 83 e6 0a 83 12 1b 9e 6e 92 03 22 16 32 38 00 04 4c 0b 09 a2 26 16 77 73 00 04 60 84 e6 0a b3 3e 14 65 2d 63 68 61 76 65 7a 00 04 60 ee bf 1b db 36 15 2e 62 65 72 72 79 00 04 60 83 32 ec 0b 32 17 69 73 68 6f 70 00 04 60 84 90 75 ab 2e 17 72 6f 77 6e 00 04 60 d0 a7 40 2b 36 16 63 61 74 72 6f 6e 00 04 60 8b c8 37 73 2e 17 6c 61 72 6b 00 04 60 ac 16 55 7b 32 17 72 65 73 70 6f 00 04 60 af d5 86 bb 2e 16 68 61 61 73 00 04 60 ae 9e 53 83 32 18 64 64 6f 63 6b 00 04 60 7d e1 4b 2b 26 18 67 79 00 04 60 88 ec 49 9b 32 18 75 73 6d 61 6e 00 04 60 85 f5 57 f3 32 17 75 67 68 65 73 00 04 60 b0 a6 fe a3 36 16 6a 61 63 6f 62 73 00 04 60 7b a5 fe 8b 2e 17 6f 6e 65 73 00 04 60 af eb 06 13 3a 16 6c 65 6f 6e 61 72 64 00 04 60 79 27 77 73 3e 16 6d 61 74 74 68 65 77 73 00 04 60 8f 64 0d 23 3e 17 69 63 6b 65 6c 73 6f 6e 00 04 60 7e 27 ee 63 36 16 6e 65 77 6d 61 6e 00 04 60 8c 11 67 eb 3a 16 72 6f 62 65 72 67 65 00 04 60 ab 56 68 63 3a 16 73 63 68 61 6e 74 7a 00 04 60 7b a2 f3 73 2a 17 69 73 6b 00 04 60 a6 9a 47 cb 36 17 77 65 65 6e 65 79 00 04 60 a9 3a 81 e3 3e 16 74 68 6f 6d 70 73 6f 6e 00 04 60 8c db 5e a3 2e 17 6f 6e 65 79 00 04 60 93 7e 3c 6b 32 16 77 65 69 73 73 00 04 60 8d 93 ad 0b 42 17 66 65 6e 64 65 72 73 6f 6e 00 04 60 7d f7 4d b3 32 17 69 6c 73 6f 6e 00 04 60 8c c5 a5 33 2e 15 31 39 35 39 00 "}}Preformatted text