Mongos: setting taskExecutorPoolSize larger than 1 cause performance drop significantly in mongodb 5.0

Environment Setup:
three 8 core 16GB Mongos
five shards each with 8 core 16GB Mongod

Workload:
SYSBENCH
Database: YCSB; Collection: t_0;
Set {field0: 1} as shard key, perform pure read workload with _id

We were surprised to find that when setting taskExecutorPoolSize to 8, the SYSBENCH qps ~1300, but when setting taskExecutorPoolSize to 1, the performance improve to ~5400 qps. Then we found this jira, but this jira explains little, can someone explain why setting taskExecutorPoolSize will cause such a difference.

We analyze the Flame Graph( it’s too big to upload, so we just show the pic)
when setting taskExecutorPoolSize = 8, we found there is a lot of futex syscall.


when setting taskExecutorPoolSize = 1, there is less lock operation.

this is the flame graph when setting taskExecutorPoolSize = 1