Dessynchronization and loss of communication in Mongo Replicaset

Hello everyone,

I had a problem with my Mongo Replicaset and i cant find the reason.
I got a Mongo Replicaset with 3 nodes.

imagen

A few days ago, i saw that the mongo2 was down at 11:55AM and no one took the primary role.


In primary node, i check logs but i didn’t see anything at this hour.


Node 1 (At this moment, secondary role).

2021-11-06T11:55:38.070+0000 I REPL [replication-2998] Choosing new sync source because the most recent OpTime of our sync source, mongo3:27000, is { ts: Timestamp(1636199703, 9), t: 76 } w
hich is more than 30s behind member mongo2:27000 whose most recent OpTime is { ts: Timestamp(1636199736, 83), t: 76 }
2021-11-06T11:55:38.070+0000 I REPL [replication-2998] Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: mongo3:27000, OpTime { ts: Times
tamp(1636199703, 9), t: 76 }, its sync source index:1
2021-11-06T11:55:38.070+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source mongo3:27000 (config version: 18; last applied optime
: { ts: Timestamp(1636199703, 9), t: 76 }; sync source index: 1; primary index: 1) is no longer valid
2021-11-06T11:55:38.070+0000 I REPL [rsBackgroundSync] Clearing sync source mongo3:27000 to choose a new one.
2021-11-06T11:55:38.070+0000 I REPL [rsBackgroundSync] sync source candidate: mongo2:27000
2021-11-06T11:55:43.069+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mongo3:27000: InvalidSyncSource: Sync source was cleared. Was mongo3:27000
2021-11-06T11:56:08.070+0000 I REPL [replication-2998] Blacklisting mongo2:27000 due to error: ‘NetworkInterfaceExceededTimeLimit: timed out’ for 10s until: 2021-11-06T11:56:18.070+0000
2021-11-06T11:56:08.070+0000 I REPL [replication-2998] could not find member to sync from
2021-11-06T11:56:18.071+0000 I REPL [rsBackgroundSync] sync source candidate: mongo2:27000
2021-11-06T11:56:48.071+0000 I REPL [replication-2998] Blacklisting mongo2:27000 due to error: ‘NetworkInterfaceExceededTimeLimit: timed out’ for 10s until:


Node 3 (At this moment, secondary role)

2021-11-06T11:55:13.070+0000 I REPL [replication-2234] Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback :: caused by :: timed out. Last fetched optime (with hash): { ts: Timestamp(1636199703, 9), t: 76 }[-3057862250994356911]. Restarts remaining: 1
2021-11-06T11:55:13.070+0000 I REPL [replication-2234] Scheduled new oplog query Fetcher source: mongo2:27000 database: local query: { find: “oplog.rs”, filter: { ts: { $gte: Timestamp(1636199703, 9) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 76, readConcern: { afterClusterTime: Timestamp(1636199703, 9) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference: { mode: “secondaryPreferred” } } active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 511965118 – target:mongo2:27000 db:local cmd:{ find: “oplog.rs”, filter: { ts: { $gte: Timestamp(1636199703, 9) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 76, readConcern: { afterClusterTime: Timestamp(1636199703, 9) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2021-11-06T11:55:15.582+0000 I REPL [replication-2227] Error returned from oplog query (no more query restarts left): MaxTimeMSExpired: error in fetcher batch callback :: caused by :: operation exceeded time limit
2021-11-06T11:55:15.582+0000 W REPL [rsBackgroundSync] Fetcher stopped querying remote oplog with error: MaxTimeMSExpired: error in fetcher batch callback :: caused by :: operation exceeded time limit
2021-11-06T11:55:15.583+0000 I REPL [rsBackgroundSync] Clearing sync source mongo2:27000 to choose a new one.
2021-11-06T11:55:15.583+0000 I REPL [rsBackgroundSync] sync source candidate: mongo2:27000
2021-11-06T11:55:18.069+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update to mongo2:27000: InvalidSyncSource: Sync source was cleared. Was mongo2:27000
2021-11-06T11:55:45.583+0000 I REPL [replication-2235] Blacklisting mongo2:27000 due to error: ‘NetworkInterfaceExceededTimeLimit: timed out’ for 10s until: 2021-11-06T11:55:55.583+0000
2021-11-06T11:55:45.583+0000 I REPL [replication-2235] could not find member to sync from
2021-11-06T11:55:55.584+0000 I REPL [rsBackgroundSync] sync source candidate: mongo2:27000


When i try to shutdown manually the primary node from Mongo Ops Manager, to try to balance the primary role to another node, the changes didn’t deploy. This “problem” said that cant deploy changes because node X is unracheable.
Finally i resolved this issue doing a reboot node per node.
Where should i search to try to know why this node appeared “shutdown”?

Greetings

Hi, if you can’t see anything from the log file on Primary node, does it mean the content of the log file is blank ? or you just can’t access the filesystem where the log file is stored ?