I have a MongoDB cluster running on Atlas. We connect each client to MongoDB by passing an appName with that we know who connects to the cluster and we can identify them.
At some points, we get a huge connection spike of unknown connections and we can’t identify them.
If an alert happens we identify the unknown connections with this script:
I tried analyzing the logs and I see a large number of connections from private IPs. These private IPs have (most of the time) an app name set. For example, they are MongoDB CPS Module.
It is not super straightforward to analyze the logs so I am not sure which IPs the UNKNOWN connections are. I only count the connections by IP.
Did anybody have a similar problem and some suggestions on how to tackle that?
For clarification - How large is the spike in the connections? Additionally, does the spike of connections cause some performance issues?
As you pointed out some of the connections are named MongoDB CPS Module - I believe this is related to the Cloud Provider Snapshots (CPS) (although you can confirm with the Atlas chat support team) which is the Backup offering available on Atlas. Additionally, please also see the following note from the Connections Limits and Cluster Tier documentation:
Atlas reserves a small number of connections to each Atlas cluster for supporting Atlas services. Contact Atlas support for more information on Atlas reserved connections.
If you’re sure that all the private IP’s do not belong to any of your clients, I’d recommend you contact the Atlas in-app chat support team to verify if these IPs are part of Atlas service(s). It would be best to advise how you have determined the IP’s / appNames (e.g. MongoDB CPS Module), if these connection spikes cause any issues, as well as how large the spikes are to the support team to verify if this is expected or not.
Please let us know if this helps. Feel free to reach out for anything else as well.
For clarification - How large is the spike in the connections? Additionally, does the spike of connections cause some performance issues?
Spikes are everywhere in the range between 1,000 - 4,000 connections of Unknown connections. I do see app names such as
MongoDB CPS Module: 3 connections
mongot initial sync and session refresh: 3 connections
mongot steady state: 6 connections
But the unknown portion is insanely high. And we set all app names everywhere. If I analyze the logs I see so many private IPs and in my eyes these can only be MongoDB internal connections then right?
If you’re sure that all the private IP’s do not belong to any of your clients, I’d recommend you contact the Atlas in-app chat support team to verify if these IPs are part of Atlas service(s).
I did that. Unfortunately, the support is super slow and didn’t help really much. I can try again.
The support also shared a file that displays the unknown connections and they display mainly a high number of connections coming from private IPs. But the support didn’t give me any explanation. They also tell me that the chat is not helping for dedicated technical issues (somewhere in that saying).
I hope this helps. I am open to any suggestions since we had this issue on the weekend again.
Additionally, does the spike of connections cause some performance issues?
Yes if the connections are at max our production application loses the connection and prod is down.