How set maxPoolSize via mongod.conf?

Rob_De_Langhe · September 13, 2020, 9:31pm

Hi,

I want to raise the value of “maxPoolSize” from its default 100, to more or less 200.
How can I do this via “mongod.conf” ?

I see in the logs of each of my 108 shard servers (36 shards, 3 servers each) that they nearly continuously setup connections to each other, ending connections when they are idle for some time, and this around some limit of 100 connections:

2020-09-13T23:27:09.269+0200 I  NETWORK  [listener] connection accepted from 10.100.22.100:9808 #2135 (97 connections now open)
2020-09-13T23:27:09.269+0200 I  NETWORK  [conn2135] received client metadata from 10.100.22.100:9808 conn2135: { driver: { name: "NetworkInterfaceTL", version: "4.4.0" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2020-09-13T23:27:09.270+0200 I  NETWORK  [conn2135] end connection 10.100.22.100:9808 (96 connections now open)
2020-09-13T23:27:09.412+0200 I  NETWORK  [listener] connection accepted from 10.100.22.100:9830 #2136 (97 connections now open)

Since this is happening nearly continuously on all of the shard servers, I suspect this is slowing down my cluster.
I found that “maxPoolSize” has a default value of 100. Is this the parameter that I should adjust to a higher value, so that all 108 shard servers can keep their connection active, even when idle for some seconds ?

Pavel_Duchovny · September 14, 2020, 5:15am

Hi @Rob_De_Langhe,

The maxPoolSize of 100 is a driver side parameter which is intended to your application connection to MongoDB.

Sharding have different pool size parameters which you can scroll from here

Now please note that default values are tuned for most use cases therefore be cautious when changing them. Please test fully in your load test env mimic production arch and traffic before deploying to prod.

Best
Pavel

Rob_De_Langhe · September 15, 2020, 7:42pm

hi Pavel,

thx a lot for your feedback !

If I browse to that URL about the “Sharding TaskExecutor PoolMaxConnecting” parameter, it says that it applies only to the router’s “mongos” program.

But we see those zillions of logs about repeatedly connections and disconnections between all the shard servers, not to or from any router “mongos” (see my extract higher up in my original post).

Those connections between all the shard servers seems to balance around some limit of 100 connections (1 or 2 less than 100) :
Since we have 36 shards of 3 servers each, thus 108 shard servers, I assume that each of the shard servers tries to maintain connections to all 107 other shard servers. Since they continuously disconnect from some and reconnect to other shard servers, still just below those 100 active connections, I suspect some limit of approx 100 connections is involved.

Pavel_Duchovny · September 16, 2020, 5:18am

Hi @Rob_De_Langhe,

The parameter ShardingTaskExecutorPoolMinSize ¶ can be set on a mongod. This parameter is by default 1 therefore shards connection pool will shrink to 1 and grow as it needed.

This could result in large amounts of connection and disconnected between the shards but this is expected and should be fine.

Not sure why you decided this has any impact on your cluster. If you want you can test increasing the min pool to maintain a larger number of connections idle, but I don’t see how thats is better to be honest.

Best
Pavel

Rob_De_Langhe · January 12, 2021, 9:12am

hi Pavel, thx for the reply, and sorry for not reacting sooner: I am still struggling to get our (now 90-shards on v4.4.1) cluster stable: even when loading a tiny amount of data, several shard servers shutdown with a fatal log message like

{"t":{"$date":"2021-01-04T13:45:47.800+01:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Writing fatal message","attr":{"message":"DBException::toString(): NetworkInterfaceExceededTimeLimit: Remote command timed out while waiting to get a connection from the pool, took 31481ms, timeout was set to 20000ms\nActual exception type: mongo::error_details::ExceptionForImpl<(mongo::ErrorCodes::Error)202, mongo::ExceptionForCat<(mongo::ErrorCategory)1>, mongo::ExceptionForCat<(mongo::ErrorCategory)10> >\n"}}

So this wait-time (“to get a connection from the pool”) took 31secs, which is way above the 20secs max, and thus these shards shutdown, leaving only 2 shard servers running per shard.
Very often, soon after that another shard server has the same shutdown, leaving not enough shard servers to obtain a majority… Aborting the transactions to that shard which in turn aborts my application and forces retries until the shard servers have restarted (I had to install a 1-minute “cron” job on each shard server to let them restart quickly).
Clearly this is not a stable setup…