Member in different region cannot catch up with oplog

Nivi_Mor · May 12, 2022, 5:38pm

Hi,

I have a mongo cluster in us-west-1 with three members. I have another member sitting in a us-east-1 which is designated for DR.

The member in us-east-1 died and we had to restart the server. Ever since, we are unable to sync it against the primary. We took a snapshot from the primary, copied it to us-east-1, created a volume, attached the volume to the DR member and started mongo.

The DR member shows that it’s 3.67 hours behind the primary and is unable to catch up. When I run db.printSlaveReplicationInfo() the lag keeps increasing until it reaches a point where it’s so far behind the primary that it transitions to RECOVERING and rs.status() says could not find member to sync from.

The question is, what could cause this member not to catch up with the primary? I’m thinking it’s a combination of network latency together with high writing operations. I read up a bit on write concern but it doesn’t seem like it’s what would solve the issue.

Any ideas how to overcome this issue and sync the DR member successfully?