MongoDB fails to start due to WiredTiger error

Gvidas_Pranauskas · March 2, 2024, 10:01pm

Hello,
I have a deployment of three replica sets of MongoDB. Suddenly, one of them crashed due to unknown reasons while others were working fine.

After the crash, one of the replica sets was not able to start - it showed WiredTiger error as a reason:

{"t":{"$date":"2024-03-02T21:50:34.955+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1709416234,"ts_usec":955926,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_EXTENSION","category_id":14,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"libcrypto: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt:crypto/evp/evp_enc.c:643:\n"}}}
{"t":{"$date":"2024-03-02T21:50:34.956+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1709416234,"ts_usec":956000,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_EXTENSION","category_id":14,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"setting return code to WT_PANIC"}}}
{"t":{"$date":"2024-03-02T21:50:34.956+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":-31804,"message":{"ts_sec":1709416234,"ts_usec":956018,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_DEFAULT","category_id":9,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__wt_btree_tree_open:639:unable to read root page from file:index-41--2757559245873504977.wt","error_str":"WT_PANIC: WiredTiger library panic","error_code":-31804}}}

As you can see, it shows unable to read root page from file:index-41--2757559245873504977.wt as an error message. I thought the data got corrupted, so what I did - created a completely new file share, attached it to the replica set and allowed the replication to fill it up with data. Once it was finished, the same error once again appeared (I do not have the original error, so the one I have provided comes from the new data).

What could be my options to fix this problem? It seems that the data is somehow corrupted and I am not able to successfully replicate it. I have tried deleting that index file, but of course, it hasn’t worked out. Should I run a repair? What is your opinion?

chris · March 2, 2024, 10:16pm

Yes, trying a repair first is a good idea.

Is this deployment using encryption at rest ?

Gvidas_Pranauskas · March 2, 2024, 10:40pm

Yes, the database is using encryption at rest.
However, when reading this article:

It recommends to never do mongod --repair for replicaset. Is this correct?

chris · March 2, 2024, 10:57pm

Yes that is correct. If it is one member perform an initial sync.

chris · March 2, 2024, 11:07pm

What is being used for the file share ?

This is a MongoDB Enterprise feature available via an Enterprise supscription, this will come with some level of support from MongoDB best advice it to open a support ticket: support.mongodb.com

Gvidas_Pranauskas · March 2, 2024, 11:20pm

I am using Percona MongoDB Operator for deploying the MongoDB on Kubernetes cluster on Azure Kubernetes Service, so I think it would qualify as a MongoDB Enterprise deployment.

Azure Storage is used for the File Share.

chris · March 2, 2024, 11:42pm

Okay good information.

Percona is the best to give advice on their tooling and mongodb server. This will likely be using their version of MongoDB.

Best to check on the Percona forums.

For MongoDB server on Azure Premium storage is recommended anything other than Azure managed disk is not likley to work. As this is Precona MongoDB Server best to check Percona’s documentation and support forums.