Mongo 4.4.0 crash - unspecified (general) error

Hello all,

I ran into an issue the other night, where Mongo was crashing without a stack-trace or any other helpful information.

There are three servers running in replicaset mode, all running on 4.4.0. The primary crashed with the following error, with no other suspect or error logs surrounding it:
{"t":{"$date":"2021-11-22T21:59:53.801+00:00"},"s":"I", "c":"CONTROL", "id":31430, "ctx":"conn772041","msg":"Error collecting stack trace: {err}","attr":{"err":"unw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\nunw_get_proc_name(7F4DDE01940B): unspecified (general) error\nunw_get_proc_name(7F4DDDD5440F): unspecified (general) error\n"}}

After the primary went down, another replica became the primary, and went on to come up and down for the next hour, and eventually stabilized.

I’ve checked the dmesg logs and there was nothing indicating that the process was killed or memory went low. Metrics show the server had a CPU spike, but only to about 50%.

I’ve tried to replicate the load that was on the server to cause another crash, unsuccessfully.

Has anyone seen an error like this before, without other information or a stack trace to go off? Does anyone have any suggestions on how to investigate further?

We’re planning on upgrading to a newer 4.4.x release, but would like to find out the problem to prevent further outages.

Hi @Seth_Prime welcome to the community.

Since it’s not reproducible and as you mentioned there’s no other error message in the logs, we have no clue why the crash happened. It could be anything at this point, e.g. corrupt memory, hardware failure, power issues, etc.

Should this happen again, I’ll be interested in whether the crash can be reproduced, or at least any clue about the situation in which it occurs. I don’t see any fixed crash issues in subsequent 4.4 series releases.

We’re planning on upgrading to a newer 4.4.x release, but would like to find out the problem to prevent further outages.

I would encourage you to do the upgrade anyway (to 4.4.10 as of this writing). If this was a known issue, then it should be fixed by the newer versions. If this is a new issue, it’s worth checking if it still happen in the latest 4.4 series. At the very least, your upgrade would also include all the fixes and improvements from 4.4.0-4.4.10.

Best regards
Kevin