About the metrics and alerts

I have a cluster on Mongo Atlas, and I constantly receive these alerts:


System: CPU (User) % has gone above 95

Ensure no index is missing and scale up. Please navigate to the [System CPU metrics page]) to see usage details.

Checking this CPU usage I reviewed the metrics about System CPU, Normalized Process CPU, and Normalized System CPU but none of those has usage over 80% (I did zoom on on the metrics for the day that I received those alerts).

¿Where can I find information about the 95% overconsumption?
¿Is there a way to see which query or process Is consuming this 95%?

Thanks in advance.
Regards,
Víctor.

Hello @Victor_Merino ,

Welcome to The MongoDB Community Forums! :wave:

  • What is your deployment type? (M0, M2, M5 … etc)
  • Is it happening too often or this is the only instance?

It might be possible by checking the timestamp of the alerts you received and check your metrics around the same timestamp.

I would recommend you to check the logs around the timestamp of the alert. There could be a slow query or similar alert that could be a starting point for the investigation.

You can use the MongoDB Atlas Performance Advisor (Only available on M10+ clusters and serverless instances). This tool provides detailed analysis and recommendations for improving the performance of your cluster.

In addition to using the Performance Advisor, you can also run the db.currentOp() command in the MongoDB shell to view information about currently running operations. This can help you identify any long-running queries or processes that may be contributing to high CPU usage.

Finally, if you are unable to identify the root cause of the high CPU usage, you may want to consider scaling up your cluster. Adding more resources, such as additional CPU cores or memory, can help alleviate performance issues caused by high CPU usage.

Lastly, I would advise you to bring this up with the Atlas chat support team. They may be able to check if anything on the Atlas side could have possibly caused this broken pipe message. In saying so, if a chat support is raised, please provide them with the following:

  1. Cluster link / name which experienced the issue
  2. Time & date including timezone for when it occurred
  3. Exact error message output
  4. Driver language and version

Regards,
Tarun

Hi, @Tarun_Gaur

First of all, thanks for the response. I review the metrics again, and I don’t know if I missed something the first time, but now I can see the metrics and the logs correctly. Finally, we detect the query that uses most of the CPU and it is possible to optimize, so we will do it.

Thanks for your time, I really appreciate it.

Regards,
Víctor

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.