View and analyze performance metrics
The AKO binary exposes standard controller-runtime metrics on http://localhost:8080/metrics. There, you can find the following:
Total number of reconciliation errors and successful reconciles per controller.
Length of reconcile queues per controller.
Reconciliation latency.
Standard resource metrics such as CPU, memory usage, and file descriptor usage.
Go runtime metrics such as the number of Go routines and GC duration.
To learn more, see Controller Metrics.
SRE Runbook
Resource Stuck in Reconciliation
Problem: Resource stuck in reconciliation
This problem occurs when the AtlasProject resource is not in a Ready state.
It can occur with every Atlas Kubernetes Operator resource type.
Symptoms
The resource is not in a
Readystate.A high error rate.
To monitor the error rate, you can create a query to calculate the
reconciliation error rate for the AtlasProject controller as a percentage
over the last minute. This metric helps in identifying and monitoring the
health and stability of the AtlasProject controller. A high or rising
error percentage indicates issues in the reconciliation process.
Example Query
To calculate the error rate, use the following Prometheus query:
100 * rate(controller_runtime_reconcile_errors_total{controller="AtlasProject"}[1m]) / rate(controller_runtime_reconcile_total{controller="AtlasProject"}[1m])
Status
Check the resource status condition for further details:
status: conditions: - type: Ready status: "False" reason: ....
Action Items
Verify Resource Status:
Check the status condition message for more detailed information.
If the
AtlasProjectis not ready, proceed with the next troubleshooting steps.
Check Connection Secret:
Ensure the connection secret referenced by
spec.connectionSecretRef.nameis correctly labeled withatlas.mongodb.com/type=credentials.
Investigate Logs:
Review logs for the
AtlasProjectcontroller for any potential errors or failed reconciliation attempts.