Mongo Cluster Primary Unresponsive in Replica Mode

Radhika_KV · July 6, 2021, 3:59pm

We have a Production Mongo Cluster set in 3 node replication mode. This has been running fine for past 6 months. However, we started noticing slowness on the primary followed by completely unresponsive primary leading to unresponsive cluster. If we remove the secondaries out of the cluster, then the primary is able to service requests normally.
We noticed high number of open connections at the time of slowness/unresponsiveness, hence set maxIdleTimeMS to 3000 for the frequently used services. This helped for a week’s time. But again it is back to the same state.
We remove the data out of the secondaries, before adding them back to the cluster. It stays fine for a day’s time and then will start to slowdown and within few hours will become unresponsive.
Please suggest possible solutions, debugging options.

kevinadi · July 15, 2021, 11:59pm

Hi @Radhika_KV welcome to the community!

I’d like to ask for further clarification.

If we remove the secondaries out of the cluster, then the primary is able to service requests normally.

I’m not sure I follow. What do you mean by “cluster” in this context? Typically, a “cluster” is a sharded cluster, but I think you’re running a replica set?

Please also post more details:

What is your MongoDB version
What is the topology, and how do you run the mongod processes (e.g. are there multiple mongod running on one server, are you using docker, what are the command line options)
What are the output of rs.status() and rs.conf()?

Best regards
Kevin