Hi!
We are running several clusters on Mongo Atlas. We use versions 6 in production, and 7 and 8 in testing and staging environments. Being aware that version 6 is end of life quite soon, we took a closer look at the testing and staging environments. And sure enough, there seems to be a problem.
We see sporadic “ReplicaSetNoPrimary” errors in various queries and use-cases.
I know that “ReplicaSetNoPrimary” is quite generic, but we had no problem with them previously, before some clusters got automatically updated to version 7.
It’s a NodeJS application, but neither driver version nor NodeJS version nor way of deployment nor any connection options vary. The only difference we could spot is the Mongo version difference: No problems in version 6, some problems in versions 7 and 8.
Questions:
A) Are there any known issues in version 7 regarding “ReplicaSetNoPrimary”?
B) If yes: How to debug them?
C) If not: What else might explain a sudden rise in “ReplicaSetNoPrimary” errors?
Thanks!
How do you connect? Try to add readPreference=primaryPreferred to your connection string. As far as I remember, in earlier releases the client automatically reconnected when new primary was elected, but this behavior changed.
Thanks for the hint! Connecting via Mongoose like mongoose.createConnection(uri, options);
Added readPreference: 'primaryPreferred' to the options, but that didn’t help. The error still occurs occasionally.
I did also a failover test (command atlas clusters failover), but the application passed the test on Mongo 8. It lost connection (according to logs) but obviously reconnected because the connection still worked immediately afterwards. Seems there is no log for successful reconnection in the application, but from the state of the application the connection must be working. So, I guess failover passed.
For references: Seems the problem was CPU idling in GCP which I had overlooked. That in combination with some cronjobs obviously caused the issue.
If someone from Mongo dev team reads this: Would be nice to change “ReplicaSetNoPrimary” into something more meaningful like: “Some generic connection error occurred. Most likely something in your infrastructure.”