Mongo Out of Memory Crash

Hi,

Apologies if this has been posted in another thread, but could not find one relevant to issues we’re experiencing.

We are getting random crashes with Mongo with out of memory errors. This typically happens when we take mongo dumps, however it’s not exclusive to that.

When looking in the logs we see messages such as:

2021-05-17T10:00:40.980+0100 F  -        [RS] out of memory.
 0x55e809404a91 0x55e809403fd7 0x55e8093f9441 0x55e808b67505 0x55e808b60a5a 0x55e807c82bff 0x55e807c82bff 0x55e808b3b48a 0x55e807c82bff 0x55e808b3b574 0x55e807c82bff 0x55e808b4b703 0x55e808b500c3 0x55e808dd38a4 0x55e808dd3b35 0x55e808ddb7be 0x55e808b379ea 0x55e808b0f954 0x55e808b0fb4f 0x55e80952a98f 0x7f5a7cb92ea5 0x7f5a7c8bb8dd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55E806B4F000","o":"28B5A91","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55E806B4F000","o":"28B4FD7","s":"_ZN5mongo29reportOutOfMemoryErrorAndExitEv"},{"b":"55E806B4F000","o":"28AA441","s":"_ZN5mongo11mongoMallocEm"},{"b":"55E806B4F000","o":"2018505","s":"_ZN5mongo24MessageCompressorManager17decompressMessageERKNS_7MessageEPh"},{"b":"55E806B4F000","o":"2011A5A"},{"b":"55E806B4F000","o":"1133BFF","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv"},{"b":"55E806B4F000","o":"1133BFF","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv"},{"b":"55E806B4F000","o":"1FEC48A","s":"_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplINS1_8FakeVoidEE16makeContinuationINS_7MessageEZZNOS9_4thenIZZNS_9transport18TransportLayerASIO11ASIOSession17sourceMessageImplERKSt10shared_ptrINS_5BatonEEENUlvE_clEvEUlvE_EEDaOT_ENKUlvE1_clEvEUlPNS1_15SharedStateImplIS8_EEPNSQ_ISB_EEE_EENS7_ISN_EEOT0_EUlS3_E_EEDaSO_EN12SpecificImpl4callEOS3_"},{"b":"55E806B4F000","o":"1133BFF","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv"},{"b":"55E806B4F000","o":"1FEC574","s":"_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplImE16makeContinuationIvZZNOS8_4thenIZNOS8_11ignoreValueEvEUlOT_E_EEDaSC_ENKUlvE1_clEvEUlPNS1_15SharedStateImplImEEPNSF_INS1_8FakeVoidEEEE_EENS7_ISB_EEOT0_EUlS3_E_EEDaSC_EN12SpecificImpl4callEOS3_"},{"b":"55E806B4F000","o":"1133BFF","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv"},{"b":"55E806B4F000","o":"1FFC703","s":"_ZN5mongo9transport18use_future_details18AsyncHandlerHelperIJSt10error_codemEE8completeIJmEEEvPNS_7PromiseImEES3_DpOT_"},{"b":"55E806B4F000","o":"20010C3","s":"_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_7generic15stream_protocolEEES2_PKNS_14mutable_bufferENS0_14transfer_all_tEN5mongo9transport18use_future_details12AsyncHandlerIJSt10error_codemEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSG_m"},{"b":"55E806B4F000","o":"22848A4","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55E806B4F000","o":"2284B35","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"55E806B4F000","o":"228C7BE","s":"_ZN4asio10io_context3runEv"},{"b":"55E806B4F000","o":"1FE89EA","s":"_ZN5mongo9transport18TransportLayerASIO11ASIOReactor3runEv"},{"b":"55E806B4F000","o":"1FC0954","s":"_ZN5mongo8executor18NetworkInterfaceTL4_runEv"},{"b":"55E806B4F000","o":"1FC0B4F"},{"b":"55E806B4F000","o":"29DB98F"},{"b":"7F5A7CB8B000","o":"7EA5"},{"b":"7F5A7C7BD000","o":"FE8DD","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.2.6", "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-1127.13.1.el7.x86_64", "version" : "#1 SMP Tue Jun 23 15:46:38 UTC 2020", "machine" : "x86_64" }, "somap" : [ { "b" : "55E806B4F000", "elfType" : 3, "buildId" : "238E5E6EA391338B335592BC2F7F233CB031F743" }, { "b" : "7FFF51AA9000", "elfType" : 3, "buildId" : "B1962B2DF9C13102C04B311EBA759764C1B8E9D8" }, { "b" : "7F5A7DFBA000", "path" : "/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "56C5F10F267E857EB448E1E823B84552B0D16976" }, { "b" : "7F5A7DDA0000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "C3B2DD93CD59A17EA97148EC98C2667ADB9987A3" }, { "b" : "7F5A7D93D000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "4CF1939F660008CFA869D8364651F31AACD2C1C4" }, { "b" : "7F5A7D6CB000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "3B305C3BA17FE394862E749763F2956C9C890C2E" }, { "b" : "7F5A7D4C7000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "F2C36986E11A291A0D4BCB3A81632B24AE2359EA" }, { "b" : "7F5A7D2BF000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "CCD4BE566DD5A8FC7FA62B224C14B698F51B0D0D" }, { "b" : "7F5A7CFBD000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "085D924F5D23B9F15A8AD28B7231EE93C09E13F1" }, { "b" : "7F5A7CDA7000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "DAC0179F4555AEFEC9E97476201802FD20C03EC5" }, { "b" : "7F5A7CB8B000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "2B482B3BAE79DEF4E5BC9791BC6BBDAE0E93E359" }, { "b" : "7F5A7C7BD000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "D78066A9C36F5FD63E2F6AC851AE3515C4C9792A" }, { "b" : "7F5A7E224000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "27FFD1FBC69569C776E666474EED723395E6D727" }, { "b" : "7F5A7C58A000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildId" : "2B77BBEFFF65E94F3E0B71A4E89BEB68C4B476C5" }, { "b" : "7F5A7C35D000", "path" : "/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "1AF123CADB2F2910E89CBD540A06D3B33692F95E" }, { "b" : "7F5A7C104000", "path" : "/lib64/libssl3.so", "elfType" : 3, "buildId" : "B6321C434B5C7386B144B925CEE2798D269FDDF5" }, { "b" : "7F5A7BEDC000", "path" : "/lib64/libsmime3.so", "elfType" : 3, "buildId" : "BDA454441F59F41D2DA36E13CEA1FC4CE95B2BBB" }, { "b" : "7F5A7BBAD000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "DC3B36B530F506DE4FC1A6612D7DF44D4A3DDCDB" }, { "b" : "7F5A7B97D000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "32C8FB6C2768FFE41E0A15CBF2089A4202CA2290" }, { "b" : "7F5A7B779000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "325B8CE57A776DE0B24B362A7E0C90E903B1A4B8" }, { "b" : "7F5A7B574000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "0460FF10A3C63749113D380C40E10DFCF066C76E" }, { "b" : "7F5A7B336000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "8840B019EDB66B0CFBD2F77EF196440F7928106E" }, { "b" : "7F5A7B0E9000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "C3914975092B29D330453950350E254AA562D642" }, { "b" : "7F5A7AE00000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "872EAC9F0CE30D5C4C37ECCCE3C586296D4FA1F0" }, { "b" : "7F5A7ABCD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "7856E751772E8538A33113BA62145A8B23314093" }, { "b" : "7F5A7A9C9000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "E4C7298B74FEEADC4DDE40CDD8C4D6B85FE09ADE" }, { "b" : "7F5A7A7BA000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "3192C56CD451E18EB9F29CB045432BA9C738DD29" }, { "b" : "7F5A7A565000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "F1FADDDE0D21D5F4E2DCADEDD3B85B6E7AAC9883" }, { "b" : "7F5A7A34F000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, { "b" : "7F5A7A13F000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "AED31F16223CE52AE079AB1ED4C09AC4C98F86B8" }, { "b" : "7F5A79F3B000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F5A79D1E000", "path" : "/lib64/libsasl2.so.3", "elfType" : 3, "buildId" : "E2F2017F821DD1B9D307DA1A9B8014F2941AEB7B" }, { "b" : "7F5A79AF7000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "903A0BD0BFB4FEE8C284F41BEB9773DED94CBC52" }, { "b" : "7F5A798C0000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "164A07A654E3B6AA09A43BF9ACE3728AB02BD0D7" }, { "b" : "7F5A7965E000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" }, { "b" : "7F5A7945B000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "197680DAE6538245CB99723E57447C4EF2E98362" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55e809404a91]
 mongod(_ZN5mongo29reportOutOfMemoryErrorAndExitEv+0x87) [0x55e809403fd7]
 mongod(_ZN5mongo11mongoMallocEm+0x21) [0x55e8093f9441]
 mongod(_ZN5mongo24MessageCompressorManager17decompressMessageERKNS_7MessageEPh+0x145) [0x55e808b67505]
 mongod(+0x2011A5A) [0x55e808b60a5a]
 mongod(_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv+0x19F) [0x55e807c82bff]
 mongod(_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv+0x19F) [0x55e807c82bff]
 mongod(_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplINS1_8FakeVoidEE16makeContinuationINS_7MessageEZZNOS9_4thenIZZNS_9transport18TransportLayerASIO11ASIOSession17sourceMessageImplERKSt10shared_ptrINS_5BatonEEENUlvE_clEvEUlvE_EEDaOT_ENKUlvE1_clEvEUlPNS1_15SharedStateImplIS8_EEPNSQ_ISB_EEE_EENS7_ISN_EEOT0_EUlS3_E_EEDaSO_EN12SpecificImpl4callEOS3_+0xCA) [0x55e808b3b48a]
 mongod(_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv+0x19F) [0x55e807c82bff]
 mongod(_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplImE16makeContinuationIvZZNOS8_4thenIZNOS8_11ignoreValueEvEUlOT_E_EEDaSC_ENKUlvE1_clEvEUlPNS1_15SharedStateImplImEEPNSF_INS1_8FakeVoidEEEE_EENS7_ISB_EEOT0_EUlS3_E_EEDaSC_EN12SpecificImpl4callEOS3_+0x74) [0x55e808b3b574]
 mongod(_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv+0x19F) [0x55e807c82bff]
 mongod(_ZN5mongo9transport18use_future_details18AsyncHandlerHelperIJSt10error_codemEE8completeIJmEEEvPNS_7PromiseImEES3_DpOT_+0xE3) [0x55e808b4b703]
 mongod(_ZN4asio6detail23reactive_socket_recv_opINS_17mutable_buffers_1ENS0_7read_opINS_19basic_stream_socketINS_7generic15stream_protocolEEES2_PKNS_14mutable_bufferENS0_14transfer_all_tEN5mongo9transport18use_future_details12AsyncHandlerIJSt10error_codemEEEEEE11do_completeEPvPNS0_19scheduler_operationERKSG_m+0x113) [0x55e808b500c3]
 mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code+0x3B4) [0x55e808dd38a4]
 mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0x115) [0x55e808dd3b35]
 mongod(_ZN4asio10io_context3runEv+0x3E) [0x55e808ddb7be]
 mongod(_ZN5mongo9transport18TransportLayerASIO11ASIOReactor3runEv+0x2A) [0x55e808b379ea]
 mongod(_ZN5mongo8executor18NetworkInterfaceTL4_runEv+0x44) [0x55e808b0f954]
 mongod(+0x1FC0B4F) [0x55e808b0fb4f]
 mongod(+0x29DB98F) [0x55e80952a98f]
 libpthread.so.0(+0x7EA5) [0x7f5a7cb92ea5]
 libc.so.6(clone+0x6D) [0x7f5a7c8bb8dd]
-----  END BACKTRACE  -----

This is on a 16GB memory server with a WiredTiger cache size of 6GB in the config file:

  wiredTiger:
    engineConfig:
      cacheSizeGB: 6.0

I’ve checked ulimits on Mongo process running:

cat /proc//limits

And I get:

Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             64000                64000                processes
Max open files            64000                64000                files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       63432                63432                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I’ve also checked that memory_overcommit is set to 2.

Would anyone have any thoughts on what else to check or troubleshoot here?

Thanks.