I have a cluster on Mongo Atlas, and I constantly receive these alerts:
System: CPU (User) % has gone above 95
Ensure no index is missing and scale up. Please navigate to the [System CPU metrics page]) to see usage details.
Checking this CPU usage I reviewed the metrics about System CPU, Normalized Process CPU, and Normalized System CPU but none of those has usage over 80% (I did zoom on on the metrics for the day that I received those alerts).
¿Where can I find information about the 95% overconsumption?
¿Is there a way to see which query or process Is consuming this 95%?
Is it happening too often or this is the only instance?
It might be possible by checking the timestamp of the alerts you received and check your metrics around the same timestamp.
I would recommend you to check the logs around the timestamp of the alert. There could be a slow query or similar alert that could be a starting point for the investigation.
You can use the MongoDB Atlas Performance Advisor (Only available on M10+ clusters and serverless instances). This tool provides detailed analysis and recommendations for improving the performance of your cluster.
In addition to using the Performance Advisor, you can also run the db.currentOp() command in the MongoDB shell to view information about currently running operations. This can help you identify any long-running queries or processes that may be contributing to high CPU usage.
Finally, if you are unable to identify the root cause of the high CPU usage, you may want to consider scaling up your cluster. Adding more resources, such as additional CPU cores or memory, can help alleviate performance issues caused by high CPU usage.
Lastly, I would advise you to bring this up with the Atlas chat support team. They may be able to check if anything on the Atlas side could have possibly caused this broken pipe message. In saying so, if a chat support is raised, please provide them with the following:
Cluster link / name which experienced the issue
Time & date including timezone for when it occurred
First of all, thanks for the response. I review the metrics again, and I don’t know if I missed something the first time, but now I can see the metrics and the logs correctly. Finally, we detect the query that uses most of the CPU and it is possible to optimize, so we will do it.