Sharded cluster deployment pods in unhealthy state forever

Im deploying shard cluster using config as follows:

apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
name: mongo-shard
spec:
shardCount: 2
mongodsPerShardCount: 3
mongosCount: 1
configServerCount: 3
version: “4.2.2-ent”
opsManager:
configMapRef:
name: mongodb-project
credentials: mongo-api-keys
type: ShardedCluster
persistent: true

But status of my deployment is:
$ kubectl get pods -n mongodb

$ kubectl describe pod/mongo-shard-0-0 -n mongodb
Type Reason Age From Message


Warning Unhealthy 2m1s (x8190 over 10h) kubelet Readiness probe failed:

$ kubectl describe pod/mongo-shard-mongos-0 -n mongodb
Events:
Type Reason Age From Message


Warning Unhealthy 3m8s (x8180 over 10h) kubelet Readiness probe failed:

Someone please help :pray:

Hi @krishna_shedbalkar and welcome to the MongoDB Community forum!!

As mentioned in the Kubernetes documentations:

Sometimes, applications are temporarily unable to serve traffic. For example, an application might need to load large data or configuration files during startup, or depend on external services after startup. In such cases, you don’t want to kill the application, but you don’t want to send it requests either. Kubernetes provides readiness probes to detect and mitigate these situations. A pod with containers reporting that they are not ready does not receive traffic through Kubernetes Services.

The Readiness probe failure could possibly be resolved by increasing the readiness timeout value set for the pods in the deployment.yaml files.
So, if you could increase the value to higher value and see if the nodes/pods come up and start.

However, to understand the issue in detail, could you help me with some information regarding the deployment:

  1. Do you see any error log messages in the pod logs which would be helpful in identifying the issue?
  2. Are you following any script or documentation for the deployment. If yes, could you share the link or documentation?
  3. Has this issue started abruptly or was there some change in the deployment or service files?

Lastly, I would recommend you to check the resource utilisation of the pods to ensure that they have enough resources allocated to them. If the pods are running out of memory or CPU, they may not be able to respond to readiness probes.

Regards
Aasawari