Hi Guys,
We are running 32k connections on mongodb so facing below error on log file. After suggestions from mongo community to check pid_max and threads-max having little bit high only and number of connections are opened which means sockets are quite high and not closed like below.
- Centos 8.1 (4.18.0-193.14.2.el8_2.x86_64)
- Mongo version 3.6.17
cat /proc/sys/kernel/pid_max 4194304
cat /proc/sys/kernel/threads-max 94465
mongod 8955 root *366u IPv4 120213606 0t0 TCP testmanager:33445->node03:49816 (CLOSE_WAIT)
mongod 8955 root *367u IPv4 120213789 0t0 TCP testmanager:33445->node03:49860 (CLOSE_WAIT)
mongod 8955 root *368u IPv4 120402126 0t0 TCP testmanager:33445->node03:49864 (CLOSE_WAIT)
mongod 8955 root *369u IPv4 120437763 0t0 TCP testmanager:33445->node03:49866 (CLOSE_WAIT)
After some time socket descriptors reaching max limit. And mongo throwing an errror for thread creation.
2021-02-24T22:50:04.692+0000 I - [listener] pthread_create failed: Resource temporarily unavailable
2021-02-24T22:50:04.692+0000 W EXECUTOR [conn480782] Terminating session due to error: InternalError: failed to create service entry worker thread
2021-02-24T22:50:05.589+0000 I - [listener] pthread_create failed: Resource temporarily unavailable
2021-02-24T22:50:05.589+0000 W EXECUTOR [conn480783] Terminating session due to error: InternalError: failed to create service entry worker thread
https://jira.mongodb.org/browse/SERVER-17687
Below observation copied from above jira ticket.
If the issue is not the system-wide limit on the number of threads then the resource exhaustion is somewhere else. You’ll need to investigate what resource is being exhausted (memory and number of file descriptors / sockets are the usual suspects) or simply lower the number of threads. If you’re not using connection pooling you’re probably running out of sockets (netstat -a | grep TIME_WAIT may help).
As per analysis, sockets descriptors getting exhausted and mongo thread creating is getting failed. Any suggestions why sockets are not getting closed or any workaround for this.