Should I be concerned if I receive monitoring alerts that my Atlas nodes have restarted?

Ben_Morris · January 31, 2023, 3:10pm

We’re using a standard 3-node Atlas replicaset in a dedicated cluster (M10, Mongo 6.0.3, AWS) and have configured an alert if the ‘Restarts in last hour is’ rule exceeds 0 for any node.

We’re seeing this alert fire every now and then and we’re wondering what this means for a node in a dedicated cluster and whether this is something to be concerned about, since I don’t think we have any control over it. Should we should disable this rule or increase the restart threshold?

Thanks in advance for any advice.

Satyam · February 7, 2023, 4:42am

Hey @Ben_Morris,

Welcome to the MongoDB Community Forums!

A node restarting is not necessarily a cause for concern. However, you should investigate the cause of the restart itself to better determine if this is an issue or not. You should take a look at your Project Activity Feed to see if you can determine why the nodes are restarting. I understand you have noted this is an M10 cluster so you should have access to the MongoDB logs, you also can check those to try determine the cause of the node restart. If you do not have access to the logs, you can consider working with Atlas in-app chat support to diagnose the issue.

It’s always good to keep the alerts active, as they can indicate a potential problem as soon as they occur. You can consider increasing the restart threshold to reduce alert noise after concluding whether the restarts are expected or not.

Hoping this helps. Please feel free to reach out for anything else as well.

Regards,
Satyam

Ben_Morris · February 7, 2023, 9:08am

Thanks for your detailed reply @Satyam. In my case, having checked the activity feed I was able to match up all the alerts we were seeing to Mongo version auto-updates on the nodes. We still wanted to keep that so we’ve increased our alert threshold to fire on >1 restart per hour rather than >0 restart. Thanks again for your help.

system · February 12, 2023, 9:09am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.