Mongos instance has been stuck when query data

info

os version: 3.10.0-123.9.3.el7.x86_64 #1 SMP Thu Nov 6 15:06:03 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

mongos BuildInfo:

mongos> db.serverBuildInfo()
{
	"version" : "3.2.18-1-ge8d4c69",
	"gitVersion" : "e8d4c699706448df0137def23532b5bc015f772d",
	"modules" : [ ],
	"allocator" : "tcmalloc",
	"javascriptEngine" : "mozjs",
	"sysInfo" : "deprecated",
	"versionArray" : [
		3,
		2,
		19,
		-100
	],
	"openssl" : {
		"running" : "disabled",
		"compiled" : "disabled"
	},
	"buildEnvironment" : {
		"distmod" : "",
		"distarch" : "x86_64",
		"cc" : "gcc: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)",
		"ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp",
		"cxx" : "g++: g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)",
		"cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -Wno-maybe-uninitialized -std=c++11",
		"linkflags" : "-fPIC -pthread -Wl,-z,now -rdynamic -fuse-ld=gold -Wl,-z,noexecstack -Wl,--warn-execstack",
		"target_arch" : "x86_64",
		"target_os" : "linux"
	},
	"bits" : 64,
	"debug" : false,
	"maxBsonObjectSize" : 16777216,
	"storageEngines" : [ ],
	"ok" : 1
}

mongod BuildInfo:

rs1:PRIMARY> db.serverBuildInfo();
{
	"version" : "3.2.18-1-ge8d4c69",
	"gitVersion" : "e8d4c699706448df0137def23532b5bc015f772d",
	"modules" : [ ],
	"allocator" : "tcmalloc",
	"javascriptEngine" : "mozjs",
	"sysInfo" : "deprecated",
	"versionArray" : [
		3,
		2,
		19,
		-100
	],
	"openssl" : {
		"running" : "OpenSSL 1.0.1e-fips 11 Feb 2013",
		"compiled" : "OpenSSL 1.0.1e-fips 11 Feb 2013"
	},
	"buildEnvironment" : {
		"distmod" : "rhel70",
		"distarch" : "x86_64",
		"cc" : "/opt/mongodbtoolchain/bin/gcc: gcc (GCC) 4.8.2",
		"ccflags" : "-fno-omit-frame-pointer -fPIC -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-but-set-variable -Wno-missing-braces -fno-builtin-memcmp",
		"cxx" : "/opt/mongodbtoolchain/bin/g++: g++ (GCC) 4.8.2",
		"cxxflags" : "-Wnon-virtual-dtor -Woverloaded-virtual -Wno-maybe-uninitialized -std=c++11",
		"linkflags" : "-fPIC -pthread -Wl,-z,now -rdynamic -fuse-ld=gold -Wl,-z,noexecstack -Wl,--warn-execstack",
		"target_arch" : "x86_64",
		"target_os" : "linux"
	},
	"bits" : 64,
	"debug" : false,
	"maxBsonObjectSize" : 16777216,
	"storageEngines" : [
		"devnull",
		"ephemeralForTest",
		"mmapv1",
		"wiredTiger"
	],
	"ok" : 1
}
rs1:PRIMARY>

issue

Mongos runs after a while. The query data from this instance will get stuck, but in the meantime, when I create a new join from this instance query again, the first few queries in this new join will succeed until the last query gets stuck.

example

# mongosh console 1
mongos> use testdb;

# the first time query.
mongos> db.package.findOne();
{
	"_id" : "7654ebaece69442dbd041d6fafefe17c",
	"name": "some test values....."
}
# the second time query.
mongos> db.package.findOne();
{
	"_id" : "7654ebaece69442dbd041d6fafefe17c",
	"name": "some test values....."
}
# the third time query.
mongos> db.package.findOne();


# this query from this connection  it has been stuck.

this query from this connection it has been stuck.
in the same time, I create a new connecion from same mongos instance.

# mongosh console 2
mongos> use testdb;
mongos> db.package.findOne();
{
	"_id" : "7654ebaece69442dbd041d6fafefe17c",
	"name": "some test values....."
}
mongos> db.package.findOne();


# this query from this connection  it has been stuck.

mongos log

the first time query.

2022-02-11T03:19:57.391+0800 D SHARDING [conn10852] command: testdb.$cmd { find: "package", filter: {}, limit: 1.0, singleBatch: true } ntoreturn: 1 options: 0
2022-02-11T03:19:57.391+0800 D EXECUTOR [conn10852] Scheduling remote command request: RemoteCommand 124970 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }
2022-02-11T03:19:57.391+0800 D ASIO     [conn10852] startCommand: RemoteCommand 124970 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }

2022-02-11T03:19:57.391+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Connecting to comm.mongo.master.1:27018


2022-02-11T03:19:57.392+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124971 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.393+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client input:
2022-02-11T03:19:57.394+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client output: xxxxxx
2022-02-11T03:19:57.394+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124971 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.396+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client input: xxxxxxx=
2022-02-11T03:19:57.396+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client output: xxxxxx=
2022-02-11T03:19:57.396+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124971 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.397+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client input: xxxxxx
2022-02-11T03:19:57.397+0800 D NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-2-0] sasl client output:
2022-02-11T03:19:57.397+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124971 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.398+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124971 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.398+0800 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Successfully connected to comm.mongo.master.1:27018, took 7ms (1 connections now open to comm.mongo.master.1:27018)
2022-02-11T03:19:57.398+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Request 124971 finished with response: {}
2022-02-11T03:19:57.398+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Initiating asynchronous command: RemoteCommand 124970 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }
2022-02-11T03:19:57.398+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Starting asynchronous command 124970 on host comm.mongo.master.1:27018
2022-02-11T03:19:57.399+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Request 124970 finished with response: { waitedMS: 0, cursor: { firstBatch: [ { "_id" : "7654ebaece69442dbd041d6fafefe17c", "name": "some test values....." } ], id: 0, ns: ""
testdb.package" }, ok: 1.0 }
2022-02-11T03:19:57.399+0800 D EXECUTOR [NetworkInterfaceASIO-TaskExecutorPool-2-0] Received remote response: RemoteResponse --  cmd:{ waitedMS: 0, cursor: { firstBatch: [ { "_id" : "7654ebaece69442dbd041d6fafefe17c", "name": "some test values....." } ], idd
: 0, ns: "testdb.package" }, ok: 1.0 }
2022-02-11T03:19:57.400+0800 D SHARDING [conn10852] Request::process end ns: testdb.$cmd msg id: 12 op: 2004 attempt: 0
2022-02-11T03:19:57.401+0800 D SHARDING [conn10852] Request::process begin ns: admin.$cmd msg id: 13 op: 2004 attempt: 0
2022-02-11T03:19:57.401+0800 D SHARDING [conn10852] command: admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn: 1 options: 0
2022-02-11T03:19:57.401+0800 D SHARDING [conn10852] Request::process end ns: admin.$cmd msg id: 13 op: 2004 attempt: 0

the second time query.

2022-02-11T03:20:11.894+0800 D SHARDING [conn10852] Request::process begin ns: testdb.$cmd msg id: 14 op: 2004 attempt: 0
2022-02-11T03:20:11.894+0800 D SHARDING [conn10852] command: testdb.$cmd { find: "package", filter: {}, limit: 1.0, singleBatch: true } ntoreturn: 1 options: 0
2022-02-11T03:20:11.894+0800 D EXECUTOR [conn10852] Scheduling remote command request: RemoteCommand 125151 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }
2022-02-11T03:20:11.894+0800 D ASIO     [conn10852] startCommand: RemoteCommand 125151 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }


2022-02-11T03:20:11.894+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-0-0] Initiating asynchronous command: RemoteCommand 125151 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }
2022-02-11T03:20:11.894+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-0-0] Starting asynchronous command 125151 on host comm.mongo.master.1:27018


2022-02-11T03:20:11.895+0800 D ASIO     [NetworkInterfaceASIO-TaskExecutorPool-0-0] Request 125151 finished with response: { waitedMS: 0, cursor: { firstBatch: [ { "_id" : "7654ebaece69442dbd041d6fafefe17c", "name": "some test values....." }, ok: 1.0 }
2022-02-11T03:20:11.895+0800 D EXECUTOR [NetworkInterfaceASIO-TaskExecutorPool-0-0] Received remote response: RemoteResponse --  cmd:{ waitedMS: 0, cursor: { firstBatch: [ { "_id" : "7654ebaece69442dbd041d6fafefe17c", "name": "some test values....." } ], idd
: 0, ns: "testdb.package" }, ok: 1.0 }
2022-02-11T03:20:11.895+0800 D SHARDING [conn10852] Request::process end ns: testdb.$cmd msg id: 14 op: 2004 attempt: 0

2022-02-11T03:20:11.896+0800 D SHARDING [conn10852] Request::process begin ns: admin.$cmd msg id: 15 op: 2004 attempt: 0
2022-02-11T03:20:11.896+0800 D SHARDING [conn10852] command: admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn: 1 options: 0
22-02-11T03:20:11.896+0800 D SHARDING [conn10852] Request::process end ns: admin.$cmd msg id: 15 op: 2004 attempt: 0

the third time query.

2022-02-11T03:20:22.323+0800 D SHARDING [conn10852] Request::process begin ns: testdb.$cmd msg id: 16 op: 2004 attempt: 0
2022-02-11T03:20:22.323+0800 D SHARDING [conn10852] command: testdb.$cmd { find: "package", filter: {}, limit: 1.0, singleBatch: true } ntoreturn: 1 options: 0
2022-02-11T03:20:22.323+0800 D EXECUTOR [conn10852] Scheduling remote command request: RemoteCommand 125201 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }
2022-02-11T03:20:22.323+0800 D ASIO     [conn10852] startCommand: RemoteCommand 125201 -- target:comm.mongo.master.1:27018 db:testdb cmd:{ find: "package", limit: 1, singleBatch: true, shardVersion: [ Timestamp 0|0, ObjectId('000000000000000000000000') ] }

A few days ago I recompiled Mongos and added a line of code at:

# grep -Hn 'debug print' src/mongo/executor/connection_pool.cpp
src/mongo/executor/connection_pool.cpp:385:    log() << "debug print: getConnection: done fulfillRequests, _readyPool size: " << _readyPool.size() << " _processingPool: " << _processingPool.size();

A log when the problem occurs again:

.....
2022-02-14T15:25:54.825+0800 I ASIO     [conn64087] debug print: getConnection: done fulfillRequests, _readyPool size: 0 _processingPool: 10

Welcome to the MongoDB Community Forums @twinklegomail!

MongoDB 3.2 was first released in Dec 2015 and reached End of Life (EOL) almost 3 1/2 years ago (Sept 2018).

Given the large range of changes to the server codebase since 2015 as well as your custom build, it is likely you are either running into an issue that has since been resolved or that is unique to changes in your build.

Please upgrade to a supported version of MongoDB server (currently 4.0 or later, although 4.0 will be EOL in April 2022 so a newer release would be recommended). If your issue is reproducible in an official build of a supported version of MongoDB server, you could report further details as a bug in the SERVER project in JIRA.

If you are unable to upgrade from your unsupported custom build of MongoDB 3.2, I expect you or your team will have to investigate and resolve this issue.

Regards,
Stennie

1 Like

a year later, I found out why, it’s a linux kernel bug, see:

running mongos os: CentOS Linux release 7.6.1810 (Core)
uname -a : Linux 3.10.0-123.9.3.el7.x86_64 #1 SMP Thu Nov 6 15:06:03 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I hope it helps you.

how to fix:
upgrade your kernel version.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.