my environment:
-
1 primary + 2 secondary + 1 arbiter; version 4.0.8
-
primary node Host 80 core cpu, 252g memory,the secondary node ,24 core ,252 cpu, all raid10 sas disk.ext4 filesystem
-
IO Scheduler and Read-Ahead have been configured according to the official website,numa disable,transparent_hugepage disable
-
datebase size 135G,Of which oplogs 80g,so the real data is only 40g
-
The number of daily connections is 3200,and there has been no business adjustment or launch recently
-
machine uptime, 1456 days
my question:
1、Suddenly the primary node, system cpu is 95%,user cpu is 5%,mongod.log shows that there are a lot of timeout operations, we verified that all are normal business operations
2. In this case, the front desk business feedback, unable to write data
3. top -Hp mongod process,140 active threads, all conn threads, no mongo system threads, about 95% of the CPU per thread
4. The database was repeatedly restarted and master-slave switchover, the failure phenomenon did not disappear. After 2 hours, for unknown reasons, the active sessions automatically dropped to about 5, the system cpu dropped below 3%, and the normal situation was restored.
I want to know what caused the massive consumption of system cpu
thanks