MongoDB fails to start due to WiredTiger error

I have a deployment of three replica sets of MongoDB. Suddenly, one of them crashed due to unknown reasons while others were working fine.

After the crash, one of the replica sets was not able to start - it showed WiredTiger error as a reason:

{"t":{"$date":"2024-03-02T21:50:34.955+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1709416234,"ts_usec":955926,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_EXTENSION","category_id":14,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"libcrypto: error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt:crypto/evp/evp_enc.c:643:\n"}}}
{"t":{"$date":"2024-03-02T21:50:34.956+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":0,"message":{"ts_sec":1709416234,"ts_usec":956000,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_EXTENSION","category_id":14,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"setting return code to WT_PANIC"}}}
{"t":{"$date":"2024-03-02T21:50:34.956+00:00"},"s":"E",  "c":"WT",       "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error message","attr":{"error":-31804,"message":{"ts_sec":1709416234,"ts_usec":956018,"thread":"1:0x7fbb49e8fc40","session_dhandle_name":"file:index-41--2757559245873504977.wt","session_name":"WT_SESSION.open_cursor","category":"WT_VERB_DEFAULT","category_id":9,"verbose_level":"ERROR","verbose_level_id":-3,"msg":"__wt_btree_tree_open:639:unable to read root page from file:index-41--2757559245873504977.wt","error_str":"WT_PANIC: WiredTiger library panic","error_code":-31804}}}

As you can see, it shows unable to read root page from file:index-41--2757559245873504977.wt as an error message. I thought the data got corrupted, so what I did - created a completely new file share, attached it to the replica set and allowed the replication to fill it up with data. Once it was finished, the same error once again appeared (I do not have the original error, so the one I have provided comes from the new data).

What could be my options to fix this problem? It seems that the data is somehow corrupted and I am not able to successfully replicate it. I have tried deleting that index file, but of course, it hasn’t worked out. Should I run a repair? What is your opinion?

Yes, trying a repair first is a good idea.

Is this deployment using encryption at rest ?

1 Like

Yes, the database is using encryption at rest.
However, when reading this article:

It recommends to never do mongod --repair for replicaset. Is this correct?

Yes that is correct. If it is one member perform an initial sync.

What is being used for the file share ?

This is a MongoDB Enterprise feature available via an Enterprise supscription, this will come with some level of support from MongoDB best advice it to open a support ticket:

I am using Percona MongoDB Operator for deploying the MongoDB on Kubernetes cluster on Azure Kubernetes Service, so I think it would qualify as a MongoDB Enterprise deployment.

Azure Storage is used for the File Share.

Okay good information.

Percona is the best to give advice on their tooling and mongodb server. This will likely be using their version of MongoDB.

Best to check on the Percona forums.

For MongoDB server on Azure Premium storage is recommended anything other than Azure managed disk is not likley to work. As this is Precona MongoDB Server best to check Percona’s documentation and support forums.