Hello
We have been testing with MongoDB replica sets and seeing MongoDB nodes crashing randomly even when there are no queries being executed. Since we are using supervisord to manage mongod processes, the mongod process gets restarted. Sometimes the restarted mongod comes up and get connected to other replica set nodes. Other times, it does not accept any connections and has to be stopped/killed and restarted. We have seen these crashes in Primary-Secondary and Primary-Secondary-Secondary replica sets. No crashes are seen in single node replica set.
So far what we have found is that hostname resolution method matters. When hostnames are in /etc/hosts file, no crash is seen. When we use a custom resolver, the crash occurs. The custom resolver is a simple python script that reads a file and returns ip for the host. It is configured via /etc/nsswitch.conf. At the moment DNS resolution is not tested.
I have logs available for the crashed servers which I can share. For now, I am pasting one of the backtrace of crash that is logged in the mongod.log file. The backtraces do not all look same to me but I may be wrong. Any help in resolving this issue is greatly appreciated. Please let me know what other information should I can further provide.
Thanks
Arshad
MongoDB version: 4.4.9, CentOS 7, glibc-2.17-323.el7_9.x86_64
mongod.conf
net:
bindIp: 192.168.1.1, 127.0.0.1
port: 27017
tls:
CAFile: /var/lib/mongo/mongodbCA.crt
allowConnectionsWithoutCertificates: true
allowInvalidCertificates: true
allowInvalidHostnames: true
certificateKeyFile: /var/lib/mongo/mongodb.pem
clusterFile: /var/lib/mongo/mongodb.pem
mode: requireTLS
unixDomainSocket:
filePermissions: '0777'
pathPrefix: /var/lib/mongo
processManagement:
fork: false
pidFilePath: /var/lib/mongo/mongod.pid
timeZoneInfo: /usr/share/zoneinfo
replication:
replSetName: kiteworks
security:
authorization: enabled
clusterAuthMode: x509
javascriptEnabled: false
storage:
dbPath: /var/lib/mongo
directoryPerDB: true
journal:
enabled: true
systemLog:
destination: file
logAppend: true
logRotate: reopen
path: /var/lib/mongo/mongod.log
'backtrace': [{'a': '557F0C6E83AA',
'b': '557F0991F000',
'o': '2DC93AA',
's': '_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606',
's+': '1EA'},
{'a': '557F0C6E9E39',
'b': '557F0991F000',
'o': '2DCAE39',
's': '_ZN5mongo15printStackTraceEv',
's+': '29'},
{'a': '557F0C6E7006',
'b': '557F0991F000',
'o': '2DC8006',
's': '_ZN5mongo12_GLOBAL__N_111myTerminateEv',
's+': 'A6'},
{'a': '557F0C877216',
'b': '557F0991F000',
'o': '2F58216',
's': '_ZN10__cxxabiv111__terminateEPFvvE',
's+': '6'},
{'a': '557F0C90B2C9',
'b': '557F0991F000',
'o': '2FEC2C9',
's': '__cxa_call_terminate',
's+': '39'},
{'a': '557F0C876C05',
'b': '557F0991F000',
'o': '2F57C05',
's': '__gxx_personality_v0',
's+': '275'},
{'a': '7F69EB5B18E3',
'b': '7F69EB5A2000',
'o': 'F8E3',
's': '_Unwind_GetTextRelBase',
's+': '23E3'},
{'a': '7F69EB5B1E17',
'b': '7F69EB5A2000',
'o': 'FE17',
's': '_Unwind_Resume',
's+': '57'},
{'a': '557F0A7DB9CD',
'b': '557F0991F000',
'o': 'EBC9CD',
's': '_ZN5mongo10ThreadPool13_consumeTasksEv.cold.574',
's+': '53'},
{'a': '557F0C217C21',
'b': '557F0991F000',
'o': '28F8C21',
's': '_ZN5mongo10ThreadPool17_workerThreadBodyEPS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE',
's+': 'E1'},
{'a': '557F0C217F50',
'b': '557F0991F000',
'o': '28F8F50',
's': '_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC4IZNS3_10ThreadPool25_startWorkerThread_inlockEvEUlvE2_JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv',
's+': '60'},
{'a': '557F0C892D2F',
'b': '557F0991F000',
'o': '2F73D2F',
's': 'execute_native_thread_routine',
's+': 'F'},
{'a': '7F69EB38DEA5',
'b': '7F69EB386000',
'o': '7EA5',
's': 'start_thread',
's+': 'C5'},
{'a': '7F69EB0B69FD',
'b': '7F69EAFB8000',
'o': 'FE9FD',
's': 'clone',
's+': '6D'}],
'processInfo': {'compiledModules': [],
'gitVersion': 'b4048e19814bfebac717cf5a880076aa69aba481',
'mongodbVersion': '4.4.9',
'somap': [{'b': '557F0991F000',
'buildId': '375E0455B64A8CCBA2B20814F34100164730166F',
'elfType': 3},
{'b': '7F69EB5A2000',
'buildId': 'EDF51350C7F71496149D064AA8B1441F786DF88A',
'elfType': 3,
'path': '/usr/lib64/libgcc_s.so.1'},
{'b': '7F69EB386000',
'buildId': 'E10CC8F2B932FC3DAEDA22F8DAC5EBB969524E5B',
'elfType': 3,
'path': '/usr/lib64/libpthread.so.0'},
{'b': '7F69EAFB8000',
'buildId': '7CD4A08C18C60E1E2EA1AEBB88C9379BD7289D38',
'elfType': 3,
'path': '/usr/lib64/libc.so.6'}],
'uname': {'machine': 'x86_64',
'release': '3.10.0-1160.59.1.el7.x86_64',
'sysname': 'Linux',
'version': '#1 SMP Wed Feb 23 16:47:03 UTC 2022'}}}