Previously stable MongoDB on AWS EKS 1.21 went unstable/crashes

Hi all
First, warning, big time noob here…
I previously had a AWS: EKS cluster (1.21), ran stable for a year, with Mongodb deployed using yaml scripts, includes a separatee PV and PVC definitions (damp thank you for that).
Went on holiday, came back and my MongoDB crashed. figured easiest is to delete the pod and have it spin up itself again, did not help, noticed that 2 of my other database platforms went unstable at the same time… not sure if someone had fingers on system, got the other 2 stable eventually by reducing nodes in node group to 0 and spinning them back up to 6 (required by my REDIS deployment) and by upgrading the EKS cluster plane to 1.22.x and the nodes.

the Mongo DB will come up, be accessible, and then crash… and go into a loop, until the crash loop count is reached.
I tried to upload a crash.log file (output from kubectl logs -n mongo but ye, new members are not allowed to upload files.
Need some assistance to try and get it stable, while keeping my data.

G

The image is specified simply as image so guessing it’s whatever is the latest. as I can’t get onto the container I can’t say more about exact version atm.

kubectl logs -f -n mongo pod/mongo-878b794fc-lg2k8

{"t":{"$date":"2023-01-10T13:19:10.651+00:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"main","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true}}}
{"t":{"$date":"2023-01-10T13:19:10.653+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2023-01-10T13:19:10.663+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2023-01-10T13:19:10.667+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","namespace":"config.tenantMigrationDonors"}}
{"t":{"$date":"2023-01-10T13:19:10.667+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","namespace":"config.tenantMigrationRecipients"}}
{"t":{"$date":"2023-01-10T13:19:10.667+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"ShardSplitDonorService","namespace":"config.tenantSplitDonors"}}
{"t":{"$date":"2023-01-10T13:19:10.667+00:00"},"s":"I",  "c":"CONTROL",  "id":5945603, "ctx":"main","msg":"Multi threading initialized"}
{"t":{"$date":"2023-01-10T13:19:10.668+00:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"mongo-878b794fc-lg2k8"}}
{"t":{"$date":"2023-01-10T13:19:10.668+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"6.0.3","gitVersion":"f803681c3ae19817d31958965850193de067c516","openSSLVersion":"OpenSSL 1.1.1f  31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2004","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2023-01-10T13:19:10.668+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"20.04"}}}
{"t":{"$date":"2023-01-10T13:19:10.668+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"},"security":{"authorization":"enabled"},"storage":{"dbPath":"/data/db"}}}}
{"t":{"$date":"2023-01-10T13:19:10.669+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}}
{"t":{"$date":"2023-01-10T13:19:10.669+00:00"},"s":"I",  "c":"STORAGE",  "id":22297,   "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}
{"t":{"$date":"2023-01-10T13:19:10.669+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=3415M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,remove=true,path=journal,compressor=snappy),builtin_extension_config=(zstd=(compression_level=6)),file_manager=(close_idle_time=600,close_scan_interval=10,close_handle_minimum=2000),statistics_log=(wait=0),json_output=(error,message),verbose=[recovery_progress:1,checkpoint_progress:1,compact_progress:1,backup:0,checkpoint:0,compact:0,evict:0,history_store:0,recovery:0,rts:0,salvage:0,tiered:0,timestamp:0,transaction:0,verify:0,log:0],"}}
{"t":{"$date":"2023-01-10T13:19:11.756+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":1087}}
{"t":{"$date":"2023-01-10T13:19:11.756+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2023-01-10T13:19:11.766+00:00"},"s":"W",  "c":"CONTROL",  "id":5123300, "ctx":"initandlisten","msg":"vm.max_map_count is too low","attr":{"currentValue":524288,"recommendedMinimum":1677720,"maxConns":838860},"tags":["startupWarnings"]}
{"t":{"$date":"2023-01-10T13:19:11.769+00:00"},"s":"I",  "c":"NETWORK",  "id":4915702, "ctx":"initandlisten","msg":"Updated wire specification","attr":{"oldSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true},"newSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":13,"maxWireVersion":17},"outgoing":{"minWireVersion":13,"maxWireVersion":17},"isInternalClient":true}}}
{"t":{"$date":"2023-01-10T13:19:11.769+00:00"},"s":"I",  "c":"REPL",     "id":5853300, "ctx":"initandlisten","msg":"current featureCompatibilityVersion value","attr":{"featureCompatibilityVersion":"5.0","context":"startup"}}
{"t":{"$date":"2023-01-10T13:19:11.769+00:00"},"s":"I",  "c":"STORAGE",  "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
{"t":{"$date":"2023-01-10T13:19:11.773+00:00"},"s":"I",  "c":"CONTROL",  "id":20536,   "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
{"t":{"$date":"2023-01-10T13:19:11.773+00:00"},"s":"I",  "c":"FTDC",     "id":20625,   "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/data/db/diagnostic.data"}}
{"t":{"$date":"2023-01-10T13:19:11.777+00:00"},"s":"I",  "c":"REPL",     "id":6015317, "ctx":"initandlisten","msg":"Setting new configuration state","attr":{"newState":"ConfigReplicationDisabled","oldState":"ConfigPreStart"}}
{"t":{"$date":"2023-01-10T13:19:11.777+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2023-01-10T13:19:11.778+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27017.sock"}}
{"t":{"$date":"2023-01-10T13:19:11.779+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0"}}
{"t":{"$date":"2023-01-10T13:19:11.779+00:00"},"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}```

as I can’t attach files will post some deployment commands here… inline

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mongo
  name: mongo
  namespace: mongo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongo
  strategy: {}
  template:
    metadata:
      labels:
        app: mongo
    spec:
      nodeSelector: 
        tier: db
      containers:
      - image: mongo
        name: mongo
        args: ["--dbpath","/data/db"]
        livenessProbe:
          exec:
            command:
              - mongo
              - --disableImplicitSessions
              - --eval
              - "db.adminCommand('ping')"
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 6
        readinessProbe:
          exec:
            command:
              - mongo
              - --disableImplicitSessions
              - --eval
              - "db.adminCommand('ping')"
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 6
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongo-creds
              key: username
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongo-creds
              key: password
        volumeMounts:
        - name: mongodb
          mountPath: "/data/db"
          
      volumes:
      - name: mongodb
        persistentVolumeClaim:
          claimName: mongo-pv-claim


# https://devopscube.com/deploy-mongodb-kubernetes/

mongo client

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mongo-client
  name: mongo-client
  namespace: mongo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mongo-client
  template:
    metadata:
      labels:
        app: mongo-client
    spec:
      containers:
      - image: mongo
        name: mongo-client
        env:
        - name: mongo-client_INITDB_ROOT_USERNAME
          value: 'dummy'
        - name: mongo-client_INITDB_ROOT_PASSWORD
          value: 'dummy'

# kubectl exec -it pod/mongo-7db655d776-bgdkm -n mongo bash
# mongosh --host mongo-nodeport-svc --port 27017 -u <user> -p <password>
#
# use trustreg
# db.txncounts.find()

# kubectl port-forward service/mongo-nodeport-svc 27017:27017 -n mongo

# show dbs
# show collections
# https://www.tutorialsteacher.com/mongodb/mongodb-shell-commands
# https://www.bmc.com/blogs/mongo-shell-basic-commands/
# https://www.digitalocean.com/community/tutorials/how-to-use-the-mongodb-shell

# Gui tool
# https://studio3t.com/?utm_source=adwords&utm_medium=ppc&utm_term=mongodb%20gui%20client&utm_campaign&hsa_net=adwords&hsa_ad=599001403327&hsa_src=g&hsa_ver=3&hsa_grp=142137327812&hsa_acc=1756351187&hsa_tgt=kwd-297480189050&hsa_mt=b&hsa_kw=mongodb%20gui%20client&hsa_cam=17308721435&gclid=CjwKCAjwq5-WBhB7EiwAl-HEksC1YcHJZA0mqOVEaSPqkD_fFOFy_WuST4beVRGzvTfY91IoOqAcOhoC9_IQAvD_BwE

#test> use trustreg
#trustreg> show collections
#trustreg> db.txncounts.count()
#DeprecationWarning: Collection.count() is deprecated. Use countDocuments or estimatedDocumentCount.
1002
#trustreg> db.txncounts.remove({})
#{ acknowledged: true, deletedCount: 1002 }
#trustreg> db.txncounts.count()

node port service

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mongo
  name: mongo-nodeport-svc
  namespace: mongo
spec:
  ports:
  - port: 27017
    protocol: TCP
    targetPort: 27017
    nodePort: 32000
  selector:
    app: mongo
  type: NodePort

pv claim

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-pv-claim
  annotations:
    volume.beta.kubernetes.io/storage-class: zone-af-south-1
  labels:
    type: aws-pvc
    app: mongo
  namespace: mongo
spec:
  storageClassName: "zone-af-south-1"
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi  # Min for PVC on AWS / EKS

As said previously, it cycles between running and crashing, during the last “running” stage I was able to get the below.
I switched into the container as you can see and then attempted to log into the mongo database, this worked stable previously

kubectl exec -it pod/mongo-878b794fc-lg2k8 -n mongo bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
root@mongo-878b794fc-lg2k8:/#
root@mongo-878b794fc-lg2k8:/#
root@mongo-878b794fc-lg2k8:/# mongosh --host mongo-nodeport-svc --port 27017 -u adminuser -p password123
Current Mongosh Log ID:	63bd67dbf136a2be55dba94a
Connecting to:		mongodb://<credentials>@mongo-nodeport-svc:27017/?directConnection=true&appName=mongosh+1.6.1
MongoNetworkError: connect ECONNREFUSED 172.20.122.42:27017
root@mongo-878b794fc-lg2k8:/#data 

so changing the host to “mongosh --host 127.0.0.1 --port 27017 -u adminuser -p password123”

allowed me to connect, and I can run queries and my data is still there, but then it crashes again…

G

some more…

kubectl exec -it pod/mongo-878b794fc-lg2k8 -n mongo -- bash
root@mongo-878b794fc-lg2k8:/# mongosh --host 127.0.0.1 --port 27017 -u adminuser -p password123
Current Mongosh Log ID:	63bd753e990f9728a3578379
Connecting to:		mongodb://<credentials>@127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.6.1
Using MongoDB:		6.0.3
Using Mongosh:		1.6.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

------
   The server generated these startup warnings when booting
   2023-01-10T14:23:40.555+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
   2023-01-10T14:23:41.634+00:00: vm.max_map_count is too low

… so seems i got it sable… for now…

rolled back to 5.0.14.

seems the 6.0.1 and 6.0.3 needs that vm.max_map_count changed/increased.

G

1 Like