Replica set in Kubernetes

Hi,

We have a replica of 3 nodes in kubernetes, however, a couple of days ago one of the nodes went down and so far we have not been able to get it up.

The database log, the available FileSystem space and the status of the “pod” were reviewed, the latter returns the error “CrashLoopBackOff”

In the database log, no error is observed that could have caused the node to crash, as for the FileSystem, it has enough space.

We’ve been doing some research, however still haven’t found anything that might be of use, as far as the solution we’re considering is deleting the pod and turning it back on.

This would be correct?
Is there anything else we could check to determine the cause of the error?

Regards.

Hi @Bet_TS
Welcome to the community forum!!

There could be various reasons why this status may be observed for the pod failure.
Can you please share the output for
kubectl describe pod <pod-name> -n <namespace>

Is the pod deployment a stateful set model o a deployment state model?

Does the pod immediately comes up after the pod has been deleted?

This would not be a recommended as the pod should start without having to delete it explicitly and make it to restart.

Let us know the describe output for the above kubernetes command and we can assist you better.

Thanks
Aasawari

Hi @Aasawari ,

I share the output of the pod with status “CrashLoopBackOff”
out_pod_describe.txt (3.1 KB)

Is the pod deployment a stateful set model o a deployment state model?

It is a deployment to stateful set model

Does the pod immediately comes up after the pod has been deleted?

yes.

Thanks for the support
Regards.

Hi @Bet_TS

Thank you for sharing the above document.

back-off restarting failed container error occurs mostly because of many reasons which could be one among them:

  1. Mounting volume problems
  2. Container existed after starting
  3. Resource issues.
  4. Timeout issues.(liveliness/readiness probe)

Apart from these, the sidecar image could also be a culprit.
However, please share the mongod log files and any relevant docker/kubernetes log files for the failing pod for better understanding.

Regards
Aasawari

Hi @Aasawari ,

I share the log of the replica node that we have not been able to start
mongo-2.log (22.1 KB)

Regarding the docker/kubernetes logs, we are still reviewing it, however, we would like to know what are the logs that could be useful to you?

  1. Mounting volume problems

The answer “Mounting volume problem” called our attention, if that were the cause, why are the other nodes not presenting the same error if it is a persistent volume?

Regards

Hi @Bet_TS

Can you please share the logs in .txt format as I am not able to access the logs shared in the above format.

Thanks
Aasawari

Hi @Aasawari

I share the log again

2022-05-13T18:03:29.518+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2022-05-13T18:03:29.624+0000 W  ASIO     [main] No TransportLayer configured during NetworkInterface startup
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/path 64-bit host=mongo-2
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] db version v4.2.7
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] git version: 51d9fe12b5d19720e72dcd7db0f2f17dd9a19212
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] modules: none
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] build environment:
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten]     distmod: ubuntu1804
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] 4096 MB of memory available to the process out of 31538 MB total system memory
2022-05-13T18:03:29.625+0000 I  CONTROL  [initandlisten] options: { net: { bindIp: "*" }, replication: { replSet: "db" } }
2022-05-13T18:03:29.634+0000 W  STORAGE  [initandlisten] Detected unclean shutdown - /path/mongod.lock is not empty.
2022-05-13T18:03:29.636+0000 I  STORAGE  [initandlisten] Detected data files in /path created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2022-05-13T18:03:29.636+0000 W  STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2022-05-13T18:03:29.637+0000 I  STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1536M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],
2022-05-13T18:03:42.173+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465022:173115][1:0x7f9b74405b00], txn-recover: Recovering log 9444 through 9445
2022-05-13T18:03:43.110+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465023:110679][1:0x7f9b74405b00], txn-recover: Recovering log 9445 through 9445
2022-05-13T18:03:44.047+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465024:47729][1:0x7f9b74405b00], txn-recover: Main recovery loop: starting at 9444/256 to 9445/256
2022-05-13T18:03:44.055+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465024:55551][1:0x7f9b74405b00], txn-recover: Recovering log 9444 through 9445
2022-05-13T18:03:44.192+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465024:192749][1:0x7f9b74405b00], file:collection-10--8124160588626690814.wt, txn-recover: Recovering log 9445 through 9445
2022-05-13T18:03:44.244+0000 I  STORAGE  [initandlisten] WiredTiger message [1652465024:244156][1:0x7f9b74405b00], file:collection-10--8124160588626690814.wt, txn-recover: Set global recovery timestamp: (1649415891, 1)
2022-05-13T18:03:44.267+0000 I  RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1649415891, 1)
2022-05-13T18:03:44.288+0000 I  STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2022-05-13T18:03:44.288+0000 I  STORAGE  [initandlisten] The size storer reports that the oplog contains 6751366 records totaling to 6973228487 bytes
2022-05-13T18:03:44.288+0000 I  STORAGE  [initandlisten] Sampling the oplog to determine where to place markers for truncation
2022-05-13T18:03:44.295+0000 I  STORAGE  [initandlisten] Sampling from the oplog between Oct  8 23:00:37:1 and Apr  8 11:11:16:1 to determine where to place markers for truncation
2022-05-13T18:03:44.295+0000 I  STORAGE  [initandlisten] Taking 260 samples and assuming that each section of oplog contains approximately 259295 records totaling to 267815917 bytes
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Oct 26 15:38:56:74
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Nov  2 16:25:53:3
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Nov 10 15:34:18:85
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Nov 30 15:21:50:38
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Nov 30 15:25:21:465
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Dec 19 08:36:04:6
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Dec 28 16:36:19:73
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Dec 31 18:08:42:11
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Dec 31 18:15:10:91
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Feb 11 20:19:52:68
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Feb 11 20:24:39:55
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Feb 24 22:44:37:2
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Mar 10 16:15:09:850
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Mar 10 16:15:26:2114
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr 22 03:15:26:1
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr 28 22:36:22:4517
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr 28 22:36:36:10
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime May 28 20:52:38:23836
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime May 28 20:53:00:84
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Jun 18 16:07:41:10544
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Jun 18 16:08:09:133
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Sep  2 17:12:37:12546
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Oct 13 09:54:27:2
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Oct 13 10:23:03:18
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Nov 30 20:50:24:1
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr  7 21:00:57:1
2022-05-13T18:03:51.990+0000 I  STORAGE  [initandlisten] WiredTiger record store oplog processing took 7701ms
2022-05-13T18:03:52.009+0000 I  STORAGE  [initandlisten] Timestamp monitor starting
2022-05-13T18:03:52.011+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.011+0000 I  CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2022-05-13T18:03:52.011+0000 I  CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2022-05-13T18:03:52.011+0000 I  CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2022-05-13T18:03:52.011+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.019+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.019+0000 I  CONTROL  [initandlisten] ** WARNING: You are running on a NUMA machine.
2022-05-13T18:03:52.019+0000 I  CONTROL  [initandlisten] **          We suggest launching mongod like this to avoid performance problems:
2022-05-13T18:03:52.019+0000 I  CONTROL  [initandlisten] **              numactl --interleave=all mongod [other options]
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] **        We suggest setting it to 'never'
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] **        We suggest setting it to 'never'
2022-05-13T18:03:52.020+0000 I  CONTROL  [initandlisten] 
2022-05-13T18:03:52.093+0000 I  SHARDING [initandlisten] Marking collection local.system.replset as collection version: <unsharded>
2022-05-13T18:03:52.116+0000 I  STORAGE  [initandlisten] Flow Control is enabled on this deployment.
2022-05-13T18:03:52.126+0000 I  SHARDING [initandlisten] Marking collection admin.system.roles as collection version: <unsharded>
2022-05-13T18:03:52.127+0000 I  SHARDING [initandlisten] Marking collection admin.system.version as collection version: <unsharded>
2022-05-13T18:03:52.127+0000 I  SHARDING [initandlisten] Marking collection local.startup_log as collection version: <unsharded>
2022-05-13T18:03:52.128+0000 I  FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/path/diagnostic.data'
2022-05-13T18:03:52.148+0000 I  SHARDING [initandlisten] Marking collection local.replset.minvalid as collection version: <unsharded>
2022-05-13T18:03:52.148+0000 I  SHARDING [initandlisten] Marking collection local.replset.election as collection version: <unsharded>
2022-05-13T18:03:52.160+0000 I  REPL     [initandlisten] Rollback ID is 1
2022-05-13T18:03:52.167+0000 I  REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1649415891, 1) (top of oplog: { ts: Timestamp(1649416276, 1), t: 30 }, appliedThrough: { ts: Timestamp(0, 0), t: -1 }, TruncateAfter: Timestamp(0, 0))
2022-05-13T18:03:52.167+0000 I  REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1649415891, 1)
2022-05-13T18:03:52.167+0000 I  REPL     [initandlisten] Replaying stored operations from Timestamp(1649415891, 1) (inclusive) to Timestamp(1649416276, 1) (inclusive).
2022-05-13T18:03:52.167+0000 I  SHARDING [initandlisten] Marking collection local.oplog.rs as collection version: <unsharded>
2022-05-13T18:03:52.175+0000 I  REPL     [initandlisten] Applied 3 operations in 1 batches. Last operation applied with optime: { ts: Timestamp(1649416276, 1), t: 30 }
2022-05-13T18:03:52.176+0000 I  SHARDING [initandlisten] Marking collection config.transactions as collection version: <unsharded>
2022-05-13T18:03:52.182+0000 I  SHARDING [LogicalSessionCacheReap] Marking collection config.system.sessions as collection version: <unsharded>
2022-05-13T18:03:52.182+0000 I  NETWORK  [listener] Listening on /tmp/mongodb-27017.sock
2022-05-13T18:03:52.182+0000 I  NETWORK  [listener] Listening on 0.0.0.0
2022-05-13T18:03:52.182+0000 I  NETWORK  [listener] waiting for connections on port 27017
2022-05-13T18:03:52.183+0000 I  CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2022-05-13T18:03:52.234+0000 I  CONTROL  [LogicalSessionCacheReap] Failed to reap transaction table: NotYetInitialized: Replication has not yet been configured
2022-05-13T18:03:52.266+0000 I  REPL     [replexec-0] New replica set config in use: { _id: "db", version: 178016725, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo-0:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo-2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f7f9a0a07508370da2f5327') } }
2022-05-13T18:03:52.266+0000 I  REPL     [replexec-0] This node is mongo-2:27017 in the config
2022-05-13T18:03:52.266+0000 I  REPL     [replexec-0] transition to STARTUP2 from STARTUP
2022-05-13T18:03:52.266+0000 I  REPL     [replexec-0] Starting replication storage threads
2022-05-13T18:03:52.286+0000 I  CONNPOOL [Replication] Connecting to mongo-1:27017
2022-05-13T18:03:52.286+0000 I  CONNPOOL [Replication] Connecting to mongo-0:27017
2022-05-13T18:03:52.286+0000 I  REPL     [replexec-0] transition to RECOVERING from STARTUP2
2022-05-13T18:03:52.286+0000 I  REPL     [replexec-0] Starting replication fetcher thread
2022-05-13T18:03:52.287+0000 I  REPL     [replexec-0] Starting replication applier thread
2022-05-13T18:03:52.287+0000 I  REPL     [replexec-0] Starting replication reporter thread
2022-05-13T18:03:52.287+0000 I  REPL     [rsSync-0] Starting oplog application
2022-05-13T18:03:52.287+0000 I  REPL     [rsBackgroundSync] waiting for 4 pings from other members before syncing
2022-05-13T18:03:52.288+0000 I  REPL     [rsSync-0] transition to SECONDARY from RECOVERING
2022-05-13T18:03:52.288+0000 I  REPL     [rsSync-0] Resetting sync source to empty, which was :27017
2022-05-13T18:03:52.301+0000 I  REPL     [replexec-0] Cannot find self in new replica set configuration; I must be removed; NodeNotFound: No host described in new configuration 178016788 for replica set db maps to this node
2022-05-13T18:03:52.302+0000 I  REPL     [replexec-0] New replica set config in use: { _id: "db", version: 178016788, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo-0:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f7f9a0a07508370da2f5327') } }
2022-05-13T18:03:52.302+0000 I  REPL     [replexec-0] This node is not a member of the config
2022-05-13T18:03:52.302+0000 I  REPL     [replexec-0] transition to REMOVED from SECONDARY
2022-05-13T18:03:52.463+0000 I  NETWORK  [listener] connection accepted from 192.168.247.168:33884 #7 (1 connection now open)
2022-05-13T18:03:52.464+0000 I  NETWORK  [conn7] received client metadata from 192.168.247.168:33884 conn7: { driver: { name: "NetworkInterfaceTL", version: "4.2.7" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-13T18:03:52.569+0000 I  NETWORK  [listener] connection accepted from 192.168.247.168:33886 #8 (2 connections now open)
2022-05-13T18:03:52.570+0000 I  NETWORK  [conn8] received client metadata from 192.168.247.168:33886 conn8: { driver: { name: "NetworkInterfaceTL", version: "4.2.7" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-13T18:03:52.608+0000 I  NETWORK  [listener] connection accepted from 192.168.247.169:39288 #9 (3 connections now open)
2022-05-13T18:03:52.609+0000 I  NETWORK  [conn9] received client metadata from 192.168.247.169:39288 conn9: { driver: { name: "NetworkInterfaceTL", version: "4.2.7" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-13T18:03:54.223+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:41000 #10 (4 connections now open)
2022-05-13T18:03:54.225+0000 I  NETWORK  [conn10] received client metadata from 127.0.0.1:41000 conn10: { driver: { name: "nodejs", version: "2.2.36" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "3.10.0-1127.el7.x86_64" }, platform: "Node.js v11.2.0, LE, mongodb-core: 2.1.20" }
2022-05-13T18:03:54.243+0000 I  REPL     [conn10] replSetReconfig admin command received from client; new config: { _id: "db", version: 178016789, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo-0:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo-2:27017" } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f7f9a0a07508370da2f5327') } }
2022-05-13T18:03:54.251+0000 I  REPL     [conn10] replSetReconfig config object with 3 members parses ok
2022-05-13T18:03:54.251+0000 I  REPL     [conn10] New replica set config in use: { _id: "db", version: 178058453, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 0, host: "mongo-1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "mongo-0:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "mongo-2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5f7f9a0a07508370da2f5327') } }
2022-05-13T18:03:54.251+0000 I  REPL     [conn10] This node is mongo-2:27017 in the config
2022-05-13T18:03:54.251+0000 I  REPL     [conn10] transition to SECONDARY from REMOVED
2022-05-13T18:03:54.251+0000 I  REPL     [conn10] Resetting sync source to empty, which was :27017
2022-05-13T18:03:54.253+0000 I  REPL     [replexec-2] Member mongo-1:27017 is now in state SECONDARY
2022-05-13T18:03:54.253+0000 I  NETWORK  [conn10] end connection 127.0.0.1:41000 (3 connections now open)
2022-05-13T18:03:54.253+0000 I  REPL     [replexec-0] Member mongo-0:27017 is now in state PRIMARY
2022-05-13T18:03:54.259+0000 I  NETWORK  [listener] connection accepted from 192.168.247.168:33894 #13 (4 connections now open)
2022-05-13T18:03:54.260+0000 I  NETWORK  [conn13] end connection 192.168.247.168:33894 (3 connections now open)
2022-05-13T18:03:54.260+0000 I  NETWORK  [listener] connection accepted from 192.168.247.169:39296 #14 (4 connections now open)
2022-05-13T18:03:54.260+0000 I  NETWORK  [conn14] end connection 192.168.247.169:39296 (3 connections now open)
2022-05-13T18:03:54.530+0000 I  NETWORK  [listener] connection accepted from 192.168.222.195:42162 #15 (4 connections now open)
2022-05-13T18:03:54.531+0000 I  NETWORK  [conn15] received client metadata from 192.168.222.195:42162 conn15: { driver: { name: "mongo-java-driver|sync|spring-boot", version: "4.0.5" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.10.0-1127.el7.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_312-b07" }
2022-05-13T18:03:55.288+0000 I  REPL     [rsBackgroundSync] sync source candidate: mongo-1:27017
2022-05-13T18:03:55.288+0000 I  CONNPOOL [RS] Connecting to mongo-1:27017
2022-05-13T18:03:55.293+0000 I  REPL     [rsBackgroundSync] Changed sync source from empty to mongo-1:27017
2022-05-13T18:03:55.502+0000 I  REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last optime fetched: { ts: Timestamp(1649416276, 1), t: 30 }. source's GTE: { ts: Timestamp(1649417431, 1), t: 31 }
2022-05-13T18:03:55.502+0000 I  REPL     [rsBackgroundSync] Replication commit point: { ts: Timestamp(0, 0), t: -1 }
2022-05-13T18:03:55.502+0000 I  REPL     [rsBackgroundSync] Rollback using 'recoverToStableTimestamp' method.
2022-05-13T18:03:55.502+0000 I  REPL     [rsBackgroundSync] Scheduling rollback (sync source: mongo-1:27017)
2022-05-13T18:03:55.502+0000 I  ROLLBACK [rsBackgroundSync] transition to ROLLBACK
2022-05-13T18:03:55.503+0000 I  REPL     [rsBackgroundSync] State transition ops metrics: { lastStateTransition: "rollback", userOpsKilled: 0, userOpsRunning: 5 }
2022-05-13T18:03:55.503+0000 I  REPL     [rsBackgroundSync] transition to ROLLBACK from SECONDARY
2022-05-13T18:03:55.503+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 9
2022-05-13T18:03:55.503+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 8
2022-05-13T18:03:55.503+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 7
2022-05-13T18:03:55.503+0000 I  NETWORK  [conn15] end connection 192.168.222.195:42162 (3 connections now open)
2022-05-13T18:03:55.503+0000 I  ROLLBACK [rsBackgroundSync] Waiting for all background operations to complete before starting rollback
2022-05-13T18:03:55.503+0000 I  ROLLBACK [rsBackgroundSync] Finished waiting for background operations to complete before rollback
2022-05-13T18:03:55.503+0000 I  ROLLBACK [rsBackgroundSync] finding common point
2022-05-13T18:03:56.399+0000 I  NETWORK  [listener] connection accepted from 192.168.222.196:58692 #18 (4 connections now open)
2022-05-13T18:03:56.400+0000 I  NETWORK  [conn18] received client metadata from 192.168.222.196:58692 conn18: { driver: { name: "mongo-java-driver|sync|spring-boot", version: "4.0.5" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.10.0-1127.el7.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_312-b07" }
2022-05-13T18:03:56.843+0000 I  ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp(1649410978, 1), t: 13 }
2022-05-13T18:03:56.843+0000 F  ROLLBACK [rsBackgroundSync] Common point must be at least stable timestamp, common point: Timestamp(1649410978, 1), stable timestamp: Timestamp(1649415891, 1)
2022-05-13T18:03:56.843+0000 F  -        [rsBackgroundSync] Fatal Assertion 51121 at src/mongo/db/repl/rollback_impl.cpp 969
2022-05-13T18:03:56.843+0000 F  -        [rsBackgroundSync] 

***aborting after fassert() failure


Regards!

Hi @Bet_TS

Thank you for sharing the logs.

From the logs you shared, it appears that the secondary in question has fallen off the oplog, i.e. the oplog in the other 2 nodes have rolled over and thus have no common point anymore between this node and the others. This necessitates a resync of the member. Please refer to the following documentation to understand more on this: resync-replica-set-member

P.S.: I notice that you have been running and older version (4.2.7 released in around 2020), I would recommend you to upgrade to a latest version Patch-release 4.2.x release in May 2022/

Since your current MongoDB server release series was released almost two years ago, bug fixes improvements have been implemented. Minor production releases do not contain breaking changes and upgrading or downgrading within the same release series is straightforward: MongoDB 4.2 Manual: Upgrade to the Latest Revision of MongoDB.

Thanks
Aasawari