i have a k3s cluster with six nodes on which I deployed MongoDB Enterprise Kubernetes Operator. The Operator ist working fine and behaves generally as expected.
My problem is that MongoDB Ops Manager says the primary member of my replica set unavailable:
As you can see the MongoDB Members are working as expected with all the features enabled. Also, the “Metrics” tab shows all the metrics I am interested in. Furthermore, these findings imply that MongoDB Automation Agents are working correctly too. So I come to the conclusion that all components, that is mongod and mongodb agent, are well and healthy which is confirmed by the logs of MongoDB Operator:
Well, I restarted the deployment and that solved the problem.
But I don’t think this is a long term solution. Especially in enterprise environments. If anybody out there is familiar with this situation please have me know. I appreciate any hint.
I don’t think this is a long term solution. Especially in enterprise environments.
I agree this is not an ideal situation in mission-critical enterprise environment. Frequently, these kind of issues are caused by the environment, and a specialized 1-1 support is usually needed to resolve it. Since Ops Manager is part of the enterprise advanced subscription, you will have access to support and thus would be able to contact support when these type of issue surfaces.
If you’re evaluating Ops Manager and would like to know more, please feel free to send a DM to me so I can connect you to the right people.
thanks for replying to my issue. Do you have any hints on what environmental topics might be reason for this to occur? We are focussed to understand our environment deeply, thus we’d like to do some analysis on ourself so we can better explain what might be the reason for this.
We’d appreciate any hint. Each is of great value for us.
That’s impossible to say without exact knowledge of the infrastructure and deployment methods. However in a very, very general sense, I would say it can be caused by the Ops Manager installation itself (i.e. it was not installed properly), or perhaps network issues. My first suggestion is to check with support since they’ll have more experience troubleshooting Ops Manager deployments, but if you can provide them with observed patterns when/if these issues are occuring, that would be one of the first steps as well.