Mongos connection full issue

Hi,

recently we got a mongos issue, after mongos restart, connections will soon be full so lead to can’t login.

tail -10 /var/log/mongodb/20000/mongodb.log
2022-03-08T11:11:36.857+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.858+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.859+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.859+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.862+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.862+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.863+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.863+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.864+0800 I NETWORK  [listener] connection refused because too many open connections: 20001
2022-03-08T11:11:36.867+0800 I NETWORK  [listener] connection refused because too many open connections: 20001

the mongos max connection is configured to 20000, and if I check with netstat , no such net connection, so seems the mongos is hung.
mongos version: 3.6.1

I printed the pstack for mongos process, all threads info look like below:

Thread 20100 (Thread 0x7f90a9f2b700 (LWP 439829)):
#0  0x00007f92104c7334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f92104c25d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f92104c24a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f921229af62 in mongo::CatalogCache::_getDatabase(mongo::OperationContext*, mongo::StringData) ()
#4  0x00007f921229d5df in mongo::CatalogCache::getCollectionRoutingInfo(mongo::OperationContext*, mongo::NamespaceString const&) ()
#5  0x00007f921205f7b6 in mongo::ClusterFind::runQuery(mongo::OperationContext*, mongo::CanonicalQuery const&, mongo::ReadPreferenceSetting const&, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >*, mongo::BSONObj*) ()
#6  0x00007f9211fdbc2d in mongo::(anonymous namespace)::ClusterFindCmd::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#7  0x00007f9212450f46 in mongo::BasicCommand::enhancedRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#8  0x00007f921244be5f in mongo::Command::publicRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#9  0x00007f9212039828 in mongo::(anonymous namespace)::runCommand ()
#10 0x00007f921203a2e3 in mongo::Strategy::clientCommand(mongo::OperationContext*, mongo::Message const&)::{lambda()#1}::operator()() const ()
#11 0x00007f921203a9c9 in mongo::Strategy::clientCommand(mongo::OperationContext*, mongo::Message const&) ()
#12 0x00007f9211f5c211 in mongo::ServiceEntryPointMongos::handleRequest(mongo::OperationContext*, mongo::Message const&) ()
#13 0x00007f9211f7889a in mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard) ()
#14 0x00007f9211f74437 in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
#15 0x00007f9211f77681 in std::_Function_handler<void ()(), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#16 0x00007f92123cab62 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()()>, mongo::transport::ServiceExecutor::ScheduleFlags) ()
#17 0x00007f9211f732a0 in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership) ()
#18 0x00007f9211f757e2 in mongo::ServiceStateMachine::_sourceCallback(mongo::Status) ()
#19 0x00007f9211f760db in mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard) ()
#20 0x00007f9211f744bd in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
#21 0x00007f9211f77681 in std::_Function_handler<void ()(), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#22 0x00007f92123cb0c5 in std::_Function_handler<void ()(), mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()()>, mongo::transport::ServiceExecutor::ScheduleFlags)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#23 0x00007f9212987644 in mongo::(anonymous namespace)::runFunc(void*) ()
#24 0x00007f92104c0aa1 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f921020dbbd in clone () from /lib64/libc.so.6
Thread 20099 (Thread 0x7f909f784700 (LWP 439830)):
#0  0x00007f92104c7334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f92104c25d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f92104c24a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f921229af62 in mongo::CatalogCache::_getDatabase(mongo::OperationContext*, mongo::StringData) ()
#4  0x00007f921229d5df in mongo::CatalogCache::getCollectionRoutingInfo(mongo::OperationContext*, mongo::NamespaceString const&) ()
#5  0x00007f921205f7b6 in mongo::ClusterFind::runQuery(mongo::OperationContext*, mongo::CanonicalQuery const&, mongo::ReadPreferenceSetting const&, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >*, mongo::BSONObj*) ()
#6  0x00007f9211fdbc2d in mongo::(anonymous namespace)::ClusterFindCmd::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#7  0x00007f9212450f46 in mongo::BasicCommand::enhancedRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#8  0x00007f921244be5f in mongo::Command::publicRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#9  0x00007f9212039828 in mongo::(anonymous namespace)::runCommand ()
#10 0x00007f921203a2e3 in mongo::Strategy::clientCommand(mongo::OperationContext*, mongo::Message const&)::{lambda()#1}::operator()() const ()
#11 0x00007f921203a9c9 in mongo::Strategy::clientCommand(mongo::OperationContext*, mongo::Message const&) ()
#12 0x00007f9211f5c211 in mongo::ServiceEntryPointMongos::handleRequest(mongo::OperationContext*, mongo::Message const&) ()
#13 0x00007f9211f7889a in mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard) ()
#14 0x00007f9211f74437 in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
#15 0x00007f9211f77681 in std::_Function_handler<void ()(), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#16 0x00007f92123cab62 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()()>, mongo::transport::ServiceExecutor::ScheduleFlags) ()
#17 0x00007f9211f732a0 in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership) ()
#18 0x00007f9211f757e2 in mongo::ServiceStateMachine::_sourceCallback(mongo::Status) ()
#19 0x00007f9211f760db in mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard) ()
#20 0x00007f9211f744bd in mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard) ()
#21 0x00007f9211f77681 in std::_Function_handler<void ()(), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::ServiceStateMachine::Ownership)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#22 0x00007f92123cb0c5 in std::_Function_handler<void ()(), mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()()>, mongo::transport::ServiceExecutor::ScheduleFlags)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#23 0x00007f9212987644 in mongo::(anonymous namespace)::runFunc(void*) ()
#24 0x00007f92104c0aa1 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f921020dbbd in clone () from /lib64/libc.so.6

I guess I encounter a bug or others, could someone can help to diagnose? thanks advance!

Hi @hunter_huang

By “recently”, do you mean that this too many open connections messages appear only lately? Please provide details on your deployment, e.g. is this a standalone/replica set/sharded cluster deployment, along with some details e.g. how many nodes in a replica set, how many shards, and any relevant info that could help.

the mongos max connection is configured to 20000

Please provide some details about where and how did you set this configuration in.

mongos version: 3.6.1

This version was released way back in Dec 2017, and the 3.6 series was out of support since April 2021. I would encourage you to consider upgrading to a newer, supported versions of MongoDB.

As a starting point, typically this message appears when the system has hit its open files limit setting. Please see UNIX ulimit Settings for instructions on how to increase this limit. Please also see the Production Notes to ensure that all settings are optimal.

Best regards
Kevin

3 Likes

Hi kevinadi,

Thanks a lot for your responce, the mongos nodes previously run with max_connection 10000, as client connections go up, I modified maxIncomingConnections from 10000 to 20000 and restarted the mongos, after that the issue occured.

the cluster is sharded cluster with 3 shards.

below is mongos config file:
image

ulimit -a output:
ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 514806
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 5242880
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 524288
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited