I’m running some MongoDB instalations with Openshift MongoDB Enterprise Operator version 1.32.0.
I find a random issue with certificates renovation.
There is a similar post talking about it (MongoDB failing after certificates are renewed), but in my case this solution is already applied and is not solving the issue or bug.
Could you say me something about the following issue? is it a bug or something i’m doing wrong?
Issue description
When a certificate (appdb-om-with-https-db-cert) is renewed mongo-enterprise-operator launch a renewal process to update 2 secrets (appdb-om-with-https-db-cert-pem and om-with-https-db-config) but the last part of the process fails.
Therefore, the cluster is left with an incorrect configuration within the om-with-https-db-config secret pointing to the old certificate (which no longer exists) instead of pointing to the newly recreated one.
Due to this error after some time all mongo pods become to a CrashLoopBackOff state:
Steps to reproduce:
As this error is random, the only way to reproduce it is force certificate renewal some times until issue happens.
- Change certificate duration and wait
- Edit cert definition and set duration to lower allowed time (1h)
- Wait until a few renews after some or all pods in mongodb namespace are with CrashLoopBackOff state
- Force manual renew
- Ensure to be loggedin openshift and set mongodb as your project
oc login oc project mongodb
- Execute certificate manager manual renew multiple times
cmctl renew appdb-om-with-https-db-cert
My current workaround.
-
Verify current certificate hash
-
Check config is pointig to wrong certificate
-
Modify config secret pointing to current certificate
-
Restart the mongo-enterprise-operator by deleting the POD (which will be recreated by the OpenShift ReplicaSet).