MongoDB randomly crashes

Hey guys, we have a single mongo instance (community) running on Ubuntu 20.4 and it keeps crashing randomly. Rebooting the system resolves the error. In the mongodb.log we got the following:

{"t":{"$date":"2023-04-16T23:01:50.516+02:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:61312","uuid":"f570f0d0-96a7-4036-a6c3-65c90800740c","connectionId":51331,"connectionCount":316}}
{"t":{"$date":"2023-04-16T23:01:50.516+02:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn51330","msg":"client metadata","attr":{"remote":"127.0.0.1:61310","client":"conn51330","doc":{"driver":{"name":"PyMongo","version":"4.3.3"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"5.15.0-69-generic"},"platform":"CPython 3.10.6.final.0"}}}
{"t":{"$date":"2023-04-16T23:01:50.516+02:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn51331","msg":"client metadata","attr":{"remote":"127.0.0.1:61312","client":"conn51331","doc":{"driver":{"name":"PyMongo","version":"4.3.3"},"os":{"type":"Linux","name":"Linux","architecture":"x86_64","version":"5.15.0-69-generic"},"platform":"CPython 3.10.6.final.0"}}}
{"t":{"$date":"2023-04-16T23:01:50.894+02:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn51330","msg":"Connection ended","attr":{"remote":"127.0.0.1:61310","uuid":"583efaef-22a9-43e2-9e9c-4c96588e2b2f","connectionId":51330,"connectionCount":315}}
{"t":{"$date":"2023-04-16T23:01:51.016+02:00"},"s":"I",  "c":"-",        "id":20883,   "ctx":"conn51329","msg":"Interrupted operation as its client disconnected","attr":{"opId":1040260}}
{"t":{"$date":"2023-04-16T23:01:51.017+02:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn51331","msg":"Connection ended","attr":{"remote":"127.0.0.1:61312","uuid":"f570f0d0-96a7-4036-a6c3-65c90800740c","connectionId":51331,"connectionCount":314}}
{"t":{"$date":"2023-04-16T23:01:51.017+02:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn51329","msg":"Connection ended","attr":{"remote":"127.0.0.1:61298","uuid":"9d873de2-eaca-451a-aa45-32c4134d3421","connectionId":51329,"connectionCount":313}}
{"t":{"$date":"2023-04-16T23:02:12.016+02:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"terminate() called. An exception is active; attempting to gather more information\n"}}
{"t":{"$date":"2023-04-16T23:02:12.017+02:00"},"s":"F",  "c":"CONTROL",  "id":6384300, "ctx":"ftdc","msg":"Writing fatal message","attr":{"message":"DBException::toString(): FileStreamFailed: Failed to write to interim file buffer for full-time diagnostic data capture: /var/lib/mongodb/diagnostic.data/metrics.interim.temp\nActual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)39, mongo::AssertionException>\n\n"}}
{"t":{"$date":"2023-04-16T23:02:12.360+02:00"},"s":"I",  "c":"CONTROL",  "id":31380,   "ctx":"ftdc","msg":"BACKTRACE","attr":{"bt":{"backtrace":[{"a":"561CB9EA4C74","b":"561CB5094000","o":"4E10C74","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.362","C":"mongo::stack_trace_detail::(anonymous namespace)::printStackTraceImpl(mongo::stack_trace_detail::(anonymous namespace)::Options const&, mongo::StackTraceSink*) [clone .constprop.362]","s+":"1F4"},{"a":"561CB9EA71B9","b":"561CB5094000","o":"4E131B9","s":"_ZN5mongo15printStackTraceEv","C":"mongo::printStackTrace()","s+":"29"},{"a":"561CB9EA1507","b":"561CB5094000","o":"4E0D507","s":"_ZN5mongo12_GLOBAL__N_111myTerminateEv","C":"mongo::(anonymous namespace)::myTerminate()","s+":"D7"},{"a":"561CBA02BFA6","b":"561CB5094000","o":"4F97FA6","s":"_ZN10__cxxabiv111__terminateEPFvvE","C":"__cxxabiv1::__terminate(void (*)())","s+":"6"},{"a":"561CBA0C0929","b":"561CB5094000","o":"502C929","s":"__cxa_call_terminate","s+":"39"},{"a":"561CBA02B995","b":"561CB5094000","o":"4F97995","s":"__gxx_personality_v0","s+":"275"},{"a":"7FD973BE8C64","b":"7FD973BD2000","o":"16C64","s":"_Unwind_GetTextRelBase","s+":"1EF4"},{"a":"7FD973BE9321","b":"7FD973BD2000","o":"17321","s":"_Unwind_RaiseException","s+":"311"},{"a":"561CBA02C107","b":"561CB5094000","o":"4F98107","s":"__cxa_throw","s+":"37"},{"a":"561CB6F83554","b":"561CB5094000","o":"1EEF554","s":"_ZN5mongo13error_details23throwExceptionForStatusERKNS_6StatusE","C":"mongo::error_details::throwExceptionForStatus(mongo::Status const&)","s+":"2036"},{"a":"561CB6F98800","b":"561CB5094000","o":"1F04800","s":"_ZN5mongo21uassertedWithLocationERKNS_6StatusEPKcj","C":"mongo::uassertedWithLocation(mongo::Status const&, char const*, unsigned int)","s+":"2F8"},{"a":"561CB6A30A8A","b":"561CB5094000","o":"199CA8A","s":"_ZN5mongo14FTDCController6doLoopEv.cold.495","C":"mongo::FTDCController::doLoop() [clone .cold.495]","s+":"A6"},{"a":"561CB747F0FC","b":"561CB5094000","o":"23EB0FC","s":"_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_14FTDCController5startEvEUlvE0_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","C":"std::thread::_State_impl<std::thread::_Invoker<std::tuple<mongo::stdx::thread::thread

Can anyone help us with this? Thank you!

Hi @Lars_Dittrich and welcome to MongoDB community forums!!

Could you help me some information regarding the deployment to assist you further.

  1. From the log, it seems you are trying to connect to the mongod process using python. Can you help me with the code snippet on how you are trying to connect?
  2. Are you able to connect outside the application using shell or compass?
  3. What is the MongoDb community version you are using?

After the mongod is connected, does the connection ends again abruptly or any request is sent to the mongo client ?

would suggest you to make sure you have all the right permissions enabled and enough empty disk space to write to the respective directory. Please visit the documentation on Configuration File Options for further details.

Regards
Aasawari

1 Like

Hi Aasawari,

ok, to have some more details about our environment. We have multiple python-based services (pymongo) that make thousands of requests per day…until this error happens everything works fine. Nothing special in the snippet…simple connect…no security enabled…

mongo = pymongo.MongoClient(config['MONGODB']['HOST'], 27017)
  1. After the error happens the mongo-service is dead…cannot connect with any client…also MongoDB Compass App cant connect…

  2. Mongo Community Version is 6.0.5

  3. After reboot everything is working again for some days…until this error returns…

Do you have any suggestions how i can force MongoDB to write this interim file buffer to test permissions?

Thank you!

Hi @Lars_Dittrich and thank you for sharing the above details.

The temporary solution as a part of the trouble shooting process would be to turn off the FTDC.

Please note that this is not a recommended procedure and does not guarantee a solution since it may be a symptom of another underlying issue.

However, if you still seeing an error even after trying to turn off FTDC, could you provide more details regarding your deployment, for example, what hardware are you using, your CPU & RAM size, are you using some container architecture, are the disks local or accessed by network (e.g. NFS), any error in any logs (not just mongod logs), and other details that may pinpoint the underlying issue

Regards
Aasawari

1 Like

Hi Aasawari, thank you for your response. We already tried to do exactly this and it seems to work.
So for everyone with the same problem. Make sure that mongodb has permissions to write to this folder:

/var/lib/mongodb/diagnostic.data

If this does not help, place this in the mongodb.conf:

setParameter:
    diagnosticDataCollectionEnabled: false

Hi Lars, when you say “rebooting the system resolves the error”, does this mean that simply restarting the service does not resolve the error?

Also, did the crashes seem to happen during periods of low activity?

We are trying to debug a random crash as well. Your forum post seems to come up on my google searches for log message entries.

Hi Aasawari, is there a JIRA ticket open that you could point me to about this type of temporary solution? I’m trying to learn more about the circumstances of why that may be a solution to see if our circumstances may match.