MongoDB crashes with signal 3 (quit)

MongoDB server suddenly crashes with Got signal: 3 (Quit).

Stack trace looks like this:

2021-04-06T13:21:09.988+0000 I NETWORK  [listener] connection accepted from 10.45.40.185:32792 #2579476 (580 connections now open)
2021-04-06T13:21:09.989+0000 I NETWORK  [conn2579476] received client metadata from 10.45.40.185:32792 conn2579476: { driver: { name: "PyMongo", version: "3.7.1" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.154-99.181.amzn1.x86_64" }, platform: "CPython 3.6.9.final.0" }
2021-04-06T13:21:09.990+0000 I NETWORK  [listener] connection accepted from 10.45.40.185:32794 #2579477 (581 connections now open)
2021-04-06T13:21:09.990+0000 I NETWORK  [conn2579477] received client metadata from 10.45.40.185:32794 conn2579477: { driver: { name: "PyMongo", version: "3.7.1" }, os: { type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.154-99.181.amzn1.x86_64" }, platform: "CPython 3.6.9.final.0" }
2021-04-06T13:21:15.832+0000 F -        [conn2575456] Got signal: 3 (Quit).
0x562ab49ddda1 0x562ab49dcfb9 0x562ab49dd49d 0x7f155a62a600 0x7f155a629b7d 0x562ab4417e5a 0x562ab4417f18 0x562ab424c11e 0x562ab4255b49 0x562ab4262a5f 0x562ab426635c 0x562ab42668e8 0x562ab307d223 0x562ab307da2d 0x562ab3081101 0x562ab4230df5 0x562ab4934e24 0x7f155a622e75 0x7f155a34b8fd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"562AB25B7000","o":"2426DA1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"562AB25B7000","o":"2425FB9"},{"b":"562AB25B7000","o":"242649D"},{"b":"7F155A61B000","o":"F600"},{"b":"7F155A61B000","o":"EB7D","s":"recvmsg"},{"b":"562AB25B7000","o":"1E60E5A","s":"_ZN4asio6detail10socket_ops4recvEiP5iovecmiRSt10error_code"},{"b":"562AB25B7000","o":"1E60F18","s":"_ZN4asio6detail10socket_ops9sync_recvEihP5iovecmibRSt10error_code"},{"b":"562AB25B7000","o":"1C9511E","s":"_ZN4asio6detail20read_buffer_sequenceINS_19basic_stream_socketINS_7generic15stream_protocolEEENS_17mutable_buffers_1EPKNS_14mutable_bufferENS0_14transfer_all_tEEEmRT_RKT0_RKT1_T2_RSt10error_code"},{"b":"562AB25B7000","o":"1C9EB49","s":"_ZN5mongo9transport18TransportLayerASIO11ASIOSession17opportunisticReadIN4asio19basic_stream_socketINS4_7generic15stream_protocolEEENS4_17mutable_buffers_1EEENS_14future_details6FutureIvEERT_RKT0_RKSt10shared_ptrINS0_5BatonEE"},{"b":"562AB25B7000","o":"1CABA5F","s":"_ZN5mongo9transport18TransportLayerASIO11ASIOSession4readIN4asio17mutable_buffers_1EEENS_14future_details6FutureIvEERKT_RKSt10shared_ptrINS0_5BatonEE"},{"b":"562AB25B7000","o":"1CAF35C","s":"_ZN5mongo9transport18TransportLayerASIO11ASIOSession17sourceMessageImplERKSt10shared_ptrINS0_5BatonEE"},{"b":"562AB25B7000","o":"1CAF8E8","s":"_ZN5mongo9transport18TransportLayerASIO11ASIOSession13sourceMessageEv"},{"b":"562AB25B7000","o":"AC6223","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},{"b":"562AB25B7000","o":"AC6A2D","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"562AB25B7000","o":"ACA101"},{"b":"562AB25B7000","o":"1C79DF5"},{"b":"562AB25B7000","o":"237DE24"},{"b":"7F155A61B000","o":"7E75"},{"b":"7F155A24D000","o":"FE8FD","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.0.9", "gitVersion" : "fc525e2d9b0e4bceff5c2201457e564362909765", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.14.154-99.181.amzn1.x86_64", "version" : "#1 SMP Sat Nov 16 01:38:34 UTC 2019", "machine" : "x86_64" }, "somap" : [ { "b" : "562AB25B7000", "elfType" : 3, "buildId" : "1608990BA9F24FFB0C9133E50C74957A69393AE7" }, { "b" : "7FFECAFE8000", "elfType" : 3, "buildId" : "644D60907530E0AC3CB1910CD1CECD19BFB27BBD" }, { "b" : "7F155BA44000", "path" : "/usr/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "CC3772AD47FA099DFDA2B50861CCD92FA719D101" }, { "b" : "7F155B82B000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "9CBEE9AA7ED85AD5BE053B483993D677420A765E" }, { "b" : "7F155B3CC000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "3270D2720328EEC2846C4B0D993582A0F657F54B" }, { "b" : "7F155B15B000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "183215EA0DA6EE9C80A1E3A3319EC2905D1BF6E0" }, { "b" : "7F155AF57000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "4663D1734EAE35F43F257D29615C1AFF5E060AE0" }, { "b" : "7F155AD4F000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "C07056C6DA664000A4DAAF8960AB182A8602E910" }, { "b" : "7F155AA4D000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "08C69C7E15BA7B4E199D2FDC1DC29B1CC1996BC1" }, { "b" : "7F155A837000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A03C9A80E995ED5F43077AB754A258FA0E34C3CD" }, { "b" : "7F155A61B000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "383B229C0E6E99B4E3BA6FC8B8C096C103226984" }, { "b" : "7F155A24D000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8BDBE5043577FC2EA218FAFD7EDF175D219698FB" }, { "b" : "7F155BCCB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "405C4E6374AAAB00F3A7F7986679078870DC2460" }, { "b" : "7F155A027000", "path" : "/usr/lib64/libnghttp2.so.14", "elfType" : 3, "buildId" : "903C20D899C962C2E93B006E3BB7172C83D8ACF4" }, { "b" : "7F1559DD9000", "path" : "/usr/lib64/libidn2.so.0", "elfType" : 3, "buildId" : "8B0B0729CFCBDFC58A731E716A5CFE88EFFD45A2" }, { "b" : "7F1559BB1000", "path" : "/usr/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "E03CF776B39054AC3B2EA2AB15B161A858B5732C" }, { "b" : "7F155993C000", "path" : "/usr/lib64/libpsl.so.0", "elfType" : 3, "buildId" : "09BFE69665CFEEC18F81D8C4A971DCA29310186C" }, { "b" : "7F15596EF000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "FE25985243C2977094769887043CD7CE965DEDAD" }, { "b" : "7F1559406000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "CB869BC8EA16FDF97808C539A9C213E2F4ED73CE" }, { "b" : "7F15591D3000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BCC1AEAE6B693FAB99579E8D18B116AC9555D17F" }, { "b" : "7F1558FD0000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "AB007F5DF96C66E515542598F5BE1429ED63D86F" }, { "b" : "7F1558D7D000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "76EEFC9EBC6A58F6C21768893861BF4EFBA28B82" }, { "b" : "7F1558B6E000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "79DD9D561E8287839B88B031A4171D4BAE2D2576" }, { "b" : "7F1558958000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F1558642000", "path" : "/usr/lib64/libunistring.so.0", "elfType" : 3, "buildId" : "2B090A6860553944846E3C227B6AD12F279B304F" }, { "b" : "7F15582CC000", "path" : "/usr/lib64/libicuuc.so.50", "elfType" : 3, "buildId" : "3207ED4AD484C205F537B6B9C52665390816FE2B" }, { "b" : "7F15580BC000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1447F994433DA2A94377D03DA49A5E78BEA2AD65" }, { "b" : "7F1557EB9000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F1557C9E000", "path" : "/usr/lib64/libsasl2.so.2", "elfType" : 3, "buildId" : "354560FFC93703E5A80EEC8C66DF9E59DA335001" }, { "b" : "7F1557A46000", "path" : "/usr/lib64/libssl3.so", "elfType" : 3, "buildId" : "7693FEC8196F8ADB894C80EDF5AC0822128FC7BF" }, { "b" : "7F155781F000", "path" : "/usr/lib64/libsmime3.so", "elfType" : 3, "buildId" : "C779ABB5959D9C27C37DCF8E61A057D104B6F671" }, { "b" : "7F15574F8000", "path" : "/usr/lib64/libnss3.so", "elfType" : 3, "buildId" : "E43EC69F6E0BE4B9CF0678F021E519FEEB92A369" }, { "b" : "7F15572C8000", "path" : "/usr/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "CDB980E3F163A54FC153EC747FBDA659222AD61B" }, { "b" : "7F15570C4000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "57C3901BDBF9C1F6150DCE3A269EBC701CF4A948" }, { "b" : "7F1556EBF000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "E92FA782A5BB19F0AFB6C83D35F176233AEBA151" }, { "b" : "7F1556C81000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "BA485B89AE011611C28A3F96AFEE5FC6B9F15B7C" }, { "b" : "7F15556AF000", "path" : "/usr/lib64/libicudata.so.50", "elfType" : 3, "buildId" : "D42D574AC100115C507E48AFC346DCD5546B825A" }, { "b" : "7F155532A000", "path" : "/usr/lib64/libstdm.so.6", "elfType" : 3, "buildId" : "8791DDD49348603CD50B74652C5B25354D8FD06E" }, { "b" : "7F1555109000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" }, { "b" : "7F1554ED2000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "8DEE27472DF04C068D3FB7D5EBD80B5829B92EC3" }, { "b" : "7F1554CD0000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "C93088FEDB7ADACD950BDBE9786D807AB9B949B2" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x562ab49ddda1]
mongod(+0x2425FB9) [0x562ab49dcfb9]
mongod(+0x242649D) [0x562ab49dd49d]
libpthread.so.0(+0xF600) [0x7f155a62a600]
libpthread.so.0(recvmsg+0x2D) [0x7f155a629b7d]
mongod(_ZN4asio6detail10socket_ops4recvEiP5iovecmiRSt10error_code+0x6A) [0x562ab4417e5a]
mongod(_ZN4asio6detail10socket_ops9sync_recvEihP5iovecmibRSt10error_code+0x68) [0x562ab4417f18]
mongod(_ZN4asio6detail20read_buffer_sequenceINS_19basic_stream_socketINS_7generic15stream_protocolEEENS_17mutable_buffers_1EPKNS_14mutable_bufferENS0_14transfer_all_tEEEmRT_RKT0_RKT1_T2_RSt10error_code+0x8E) [0x562ab424c11e]
mongod(_ZN5mongo9transport18TransportLayerASIO11ASIOSession17opportunisticReadIN4asio19basic_stream_socketINS4_7generic15stream_protocolEEENS4_17mutable_buffers_1EEENS_14future_details6FutureIvEERT_RKT0_RKSt10shared_ptrINS0_5BatonEE+0x99) [0x562ab4255b49]
mongod(_ZN5mongo9transport18TransportLayerASIO11ASIOSession4readIN4asio17mutable_buffers_1EEENS_14future_details6FutureIvEERKT_RKSt10shared_ptrINS0_5BatonEE+0x13F) [0x562ab4262a5f]
mongod(_ZN5mongo9transport18TransportLayerASIO11ASIOSession17sourceMessageImplERKSt10shared_ptrINS0_5BatonEE+0x9C) [0x562ab426635c]
mongod(_ZN5mongo9transport18TransportLayerASIO11ASIOSession13sourceMessageEv+0x48) [0x562ab42668e8]
mongod(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x493) [0x562ab307d223]
mongod(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x11D) [0x562ab307da2d]
mongod(+0xACA101) [0x562ab3081101]
mongod(+0x1C79DF5) [0x562ab4230df5]
mongod(+0x237DE24) [0x562ab4934e24]
libpthread.so.0(+0x7E75) [0x7f155a622e75]
libc.so.6(clone+0x6D) [0x7f155a34b8fd]
-----  END BACKTRACE  -----

Can someone provide any information on why this crash happens.
The crash happens once in a while typically when the load on the node is high.

Welcome to the MongoDB Community @Abhishek_Sinha1!

Please share some more details about your deployment:

  • specific version of MongoDB server used
  • O/S version
  • type of deployment (standalone, replica set, or sharded cluster)

Does high load also correlate with low free memory or increased swap usage?

One likely possibility is the Linux Out-Of-Memory (OOM) process killer looking to free up RAM. If this is the culprit, there should be some evidence in your system logs, eg: dmesg | egrep -i “killed process”.

Regards,
Stennie

@Stennie_X
MongoDB version: 4.0.9
OS: { type: “Linux”, name: “Linux”, architecture: “x86_64”, version: “4.14.154-99.181.amzn1.x86_64” }
deployment: standalone

High load does not relate to low memory.
Also, a couple of days back, it crashed without any load at all. There is no OOM scenario. The system had 150GB+ free memory.

@Stennie_X can you share any pointers to debug further?

Hi @Abhishek_Sinha1

I haven’t been able to find any reported issues with Signal 3 so far, and from your description it doesn’t seem like there are any pattern to the crashes at all.

However, I would like to suggest some things that may improve the situation:

  1. MongoDB 4.0.9 was released in April 2019 (2 years ago today). Is it possible for you to upgrade to the latest version in the 4.0 series (currently 4.0.24) and see if this still occurs? You might be hitting an issue that was resolved in newer versions.
  2. If you are running the mongod in conjunction with any other processes, is it possible for you to exclusively run only mongod in that server?
  3. Are you running any third-party security software? There has been known cases in the past where security software interferes and corrupts MongoDB memory and causes difficult to debug crashes.
  4. Please follow the recommendations in the Production Notes regarding supported OS and their settings. Notably, I think you’re using Amazon Linux 1, which was not supported anymore. Is it possible to use Amazon Linux 2 instead?

It will be helpful if you can provide any pattern to the crashes (e.g. what operation is Pymongo doing before any crash, how many connections it handles before it crash, time of day of the crashes, etc.), as it’s difficult to pin down causes of issues with no known pattern. However, if there is no pattern, typically it’s a hardware issue with random memory corruption or similar.

Best regards,
Kevin

1 Like

Hi @kevinadi

Thanks for your suggestions. I will see if these can be incorporated.

Regarding the pattern, the crash happens when the load on MongoDB is high (around 500 connections). This is during the morning hours (GMT) when we run a lot of jobs on the server. PyMongo operation majorly involves find and update queries.

We enabled verbose logging as well on MongoDB.
I am attaching a few more images of the logs that we captured.

Hi @Abhishek_Sinha1

Please don’t post screenshots if possible as they are hard to read and not searchable :slight_smile:

Regarding the pattern, the crash happens when the load on MongoDB is high (around 500 connections).

Typically this implies an OS enforced limit of some kind. As MongoDB does not kill itself even under high load (but will obediently push through no matter how long an operation takes), I would look for any OS level setting that limit disk/CPU/memory usage as mentioned in the Production Notes, and double check that all limits are set to the recommended levels.

One question I neglected to ask was how did you install MongoDB? Are you following the instructions at Install MongoDB Community Edition on Amazon Linux, using Docker, or some other method? If you’re using Docker or similar method, additional limits may be enforced by the container host in addition to the OS.

Best regards,
Kevin

1 Like

@kevinadi

MongoDB is installed by downloading tgz binaries. It is a standalone server.

I just noticed this in the log:
2021-04-20T11:56:50.402+0000 I CONTROL [initandlisten] build environment:
2021-04-20T11:56:50.402+0000 I CONTROL [initandlisten] distmod: rhel62
2021-04-20T11:56:50.402+0000 I CONTROL [initandlisten] distarch: x86_64
2021-04-20T11:56:50.402+0000 I CONTROL [initandlisten] target_arch: x86_64

The distmod is rhel62, Could this be causing the issue? Is there any Amazon Linux specific dependency? Since behind the scene, I believe Amazon Linux is on RHEL itself.

Hi @Abhishek_Sinha1

If the mongod process can run without issues unless it’s under a high load, I don’t think the problem was caused by any dependency issues. If it was, I would think that the process would have multiple issues to start up.

I would encourage you to match up the values in your deployment and the Production Notes for any discrepancies, and also use a more recent MongoDB versions and a supported OS as well to minimize the risk of issues.

Best regards,
Kevin

@kevinadi

What I meant was the production server is on Amazon Linux 1 whereas we have used CentOS 6.2 MongoDB package.
Since the stack trace had some errors from libpthread and C functions, I thought that there could be some incompatibility between these two which is causing these random failures.

Is there any way to decode the backtrace which I posted in the first comment in order to understand the issue better?

Hi @Abhishek_Sinha1

I don’t believe there was a crash, actually. Isn’t mongod was killed by Signal 3 (Quit) in all cases?

Unless I’m missing something, I think the more productive way is to investigate how and where that Signal 3 is coming from.

Best regards,
Kevin

Yes, It has been signal 3 always.

@kevinadi

Any pointers to debug the signal 3 request?
We have gone through dmesg, kern, audit logs but did not find any occurrence of any log sending signal 3.

2021-04-06T13:21:15.832+0000 F - [conn2575456] Got signal: 3 (Quit)

Here, signal 3 is associated with a connection id and this connection is from PyMongo. Is there a possibility of the client sending a signal on any particular query?

Also, we are running MongoDB on NUMA. Could that be a possibility?

Hi @Abhishek_Sinha1

Here, signal 3 is associated with a connection id and this connection is from PyMongo. Is there a possibility of the client sending a signal on any particular query?

I don’t think any official drivers have a “server kill-switch” since that will be quite dangerous and prone to abuse. If a driver during normal operation can bring down a server by sending kill signals by itself (without you instructing it to do so), please file a ticket in the relevant driver’s JIRA project.

Also, we are running MongoDB on NUMA. Could that be a possibility?

About NUMA, it’s mentioned in the Production Notes: MongoDB and NUMA Hardware.

Having said that, I think those are red herrings. I believe we established that:

  1. The server got a Signal 3 (Quit) during high load. It went down after receiving this signal, and it was not a hard crash.
  2. This doesn’t happen during a non-heavy load.
  3. The Production Notes recommended settings are not yet set up.
  4. The MongoDB server is an old MongoDB version (2 years old) and is running on an unsupported OS (Amazon Linux 1).

Please correct me if I misunderstand anything.

If my understanding is correct, the only issue is tracing how the server got that Signal 3, and from where. I would reiterate my earlier suggestion to check for OS limits (e.g. ulimit), and see if they match or exceed what’s mentioned in the Production Notes. If everything is in order and you can rule out the OS as the culprit, it might be worth upgrading to the latest MongoDB version to see if this still occurs. All else fails, it might be possible that the app is sending it for some reason.

Best regards,
Kevin

1 Like