WiredTiger Error: cannot be used until all column groups are created: Invalid argument

After a Kubernetes node crash our production instance got offline and when we tried to get it up with --repair flag it fails in the middle with the error:

'table:sizeStorer' cannot be used until all column groups are created: Invalid argument

Our small business currently have a full backup from a month ago and only few of the collections have changed in that month, but we still want to get some way to recover, regenerate or recreate the sizeStorer file.
There’s any way or source I can try in order to recover the file? I have time as production is already running from the backup, also I already downloaded al files locally and tried to use Wiretiger (wt tool) savage() and verify() functions with the following result:

[1666250062:785521][31884:0x1031c8580], wt, WT_SESSION.salvage: [WT_VERB_DEFAULT][ERROR]: int __wt_schema_get_table_uri(WT_SESSION_IMPL *, const char *, _Bool, uint32_t, WT_TABLE **), 85: 'table:sizeStorer' cannot be used until all column groups are created: Invalid argument
wt: session.salvage: table:sizeStorer: Invalid argument

This look to be the function that throws the error: https://github.com/wiredtiger/wiredtiger/blob/b3e3b2580ce14e7609e7ba269d3c7432f81339b7/src/schema/schema_list.c#L68

One thing you can try is to find the sizeStorer.wt file and delete it, then try restarting mongod in repair mode. If the file is missing, the repair function should regenerate a new one. Note that all your collections will report “0 documents” for estimated count and collection size, but this can be corrected by running validate on all collections.

2 Likes

This is the result. It looks like Wiredtiger.wt is the real problem, but just with the key of "table:sizeStorer”!

{"t":{"$date":"2022-10-22T07:56:31.418+00:00"},"s":"I",  "c":"STORAGE",  "id":22316,   "ctx":"initandlisten","msg":"Repairing size cache"}
{"t":{"$date":"2022-10-22T07:56:31.418+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":22,"message":"[1666425391:418780][179:0xfffef4c30480], WT_SESSION.verify: __wt_schema_get_table_uri, 85: 'table:sizeStorer' cannot be used until all column groups are created: Invalid argument"}}
{"t":{"$date":"2022-10-22T07:56:31.419+00:00"},"s":"I",  "c":"STORAGE",  "id":22328,   "ctx":"initandlisten","msg":"Verify failed. Running a salvage operation.","attr":{"uri":"table:sizeStorer"}}
{"t":{"$date":"2022-10-22T07:56:31.419+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":22,"message":"[1666425391:419873][179:0xfffef4c30480], WT_SESSION.salvage: __wt_schema_get_table_uri, 85: 'table:sizeStorer' cannot be used until all column groups are created: Invalid argument"}}
{"t":{"$date":"2022-10-22T07:56:31.420+00:00"},"s":"W",  "c":"STORAGE",  "id":22351,   "ctx":"initandlisten","msg":"Salvage failed. The file will be moved out of the way and a new ident will be created.","attr":{"uri":"table:sizeStorer","error":{"code":2,"codeName":"BadValue","errmsg":"Salvage failed: 22: Invalid argument"}}}
{"t":{"$date":"2022-10-22T07:56:31.421+00:00"},"s":"W",  "c":"STORAGE",  "id":22353,   "ctx":"initandlisten","msg":"Rebuilding ident","attr":{"ident":"sizeStorer"}}
{"t":{"$date":"2022-10-22T07:56:31.421+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31803,"message":"[1666425391:421919][179:0xfffef4c30480], file:WiredTiger.wt, WT_CURSOR.search: __schema_create_collapse, 106: metadata information for source configuration \"colgroup:sizeStorer\" not found: WT_NOTFOUND: item not found"}}
{"t":{"$date":"2022-10-22T07:56:31.422+00:00"},"s":"E",  "c":"STORAGE",  "id":22357,   "ctx":"initandlisten","msg":"Rebuilding ident failed: failed to get metadata","attr":{"uri":"table:sizeStorer","error":{"code":4,"codeName":"NoSuchKey","errmsg":"Unable to find metadata for table:sizeStorer"}}}
{"t":{"$date":"2022-10-22T07:56:31.423+00:00"},"s":"F",  "c":"-",        "id":23095,   "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":28577,"error":"NoSuchKey: Unable to find metadata for table:sizeStorer","file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":569}}

Note: When I explore the hex in WiredTiger.wt I see a lot of Mfile:sizeStorer.wt names, but only one Etable:sizeStorer
Note2: Indeed, I get an error reporting metadata corruption but when it tries to rebuild it hits the error marked in the post ('table:sizeStorer' cannot be used until all column groups are created: Invalid argument”).

I’ve removed the E in Etable:sizeStorer and the WiredTiger is ok with it now (it inits the salvage process).
I’m running into problems with table:_mdb_catalog, will update soon if I get to rebuild sizeStorer.

This is my current log:

{"t":{"$date":"2022-10-24T19:45:19.523+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":3913}}
{"t":{"$date":"2022-10-24T19:45:19.524+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2022-10-24T19:45:19.524+00:00"},"s":"I",  "c":"STORAGE",  "id":22316,   "ctx":"initandlisten","msg":"Repairing size cache"}
{"t":{"$date":"2022-10-24T19:45:19.527+00:00"},"s":"I",  "c":"STORAGE",  "id":22327,   "ctx":"initandlisten","msg":"Verify succeeded. Not salvaging.","attr":{"uri":"table:sizeStorer"}}
{"t":{"$date":"2022-10-24T19:45:19.528+00:00"},"s":"I",  "c":"STORAGE",  "id":22246,   "ctx":"initandlisten","msg":"Repairing catalog metadata"}
{"t":{"$date":"2022-10-24T19:45:19.528+00:00"},"s":"W",  "c":"STORAGE",  "id":22350,   "ctx":"initandlisten","msg":"Data file is missing. Attempting to drop and re-create the collection.","attr":{"uri":"table:_mdb_catalog"}}
{"t":{"$date":"2022-10-24T19:45:19.530+00:00"},"s":"W",  "c":"STORAGE",  "id":22352,   "ctx":"initandlisten","msg":"Moving data file to backup","attr":{"file":"./_mdb_catalog.wt","backup":"./_mdb_catalog.wt.corrupt"}}
{"t":{"$date":"2022-10-24T19:45:19.531+00:00"},"s":"F",  "c":"-",        "id":23095,   "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":50926,"error":"FileRenameFailed: Attempted to rename file to an existing file: ./_mdb_catalog.wt.corrupt","file":"src/mongo/db/storage/storage_engine_impl.cpp","line":122}}
{"t":{"$date":"2022-10-24T19:45:19.531+00:00"},"s":"F",  "c":"-",        "id":23096,   "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"}

Just for the record:
I copied the WiredTiger.wt from the backup, the _mdb_catalog and sizeStorer.wt from the corrupted data, removed all the collections where the schema (index, validation, etc) changed during the affected timespan.
The DB repaired itself and rebuild the metadata.

I lost just one specific collection that has 635K documents, but only 34 documents were created in affected timespan.

All relevant collections get recovered.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.