Hello all,
I ran into an issue the other night, where Mongo was crashing without a stack-trace or any other helpful information.
There are three servers running in replicaset mode, all running on 4.4.0. The primary crashed with the following error, with no other suspect or error logs surrounding it:
{"t":{"$date":"2021-11-22T21:59:53.801+00:00"},"s":"I", "c":"CONTROL", "id":31430, "ctx":"conn772041","msg":"Error collecting stack trace: {err}","attr":{"err":"unw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\nunw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\n"}}
After the primary went down, another replica became the primary, and went on to come up and down for the next hour, and eventually stabilized.
I’ve checked the dmesg logs and there was nothing indicating that the process was killed or memory went low. Metrics show the server had a CPU spike, but only to about 50%.
I’ve tried to replicate the load that was on the server to cause another crash, unsuccessfully.
Has anyone seen an error like this before, without other information or a stack trace to go off? Does anyone have any suggestions on how to investigate further?
We’re planning on upgrading to a newer 4.4.x release, but would like to find out the problem to prevent further outages.