Priority of "Replication" vs "Fail-over"

Kim_Hakseon · March 6, 2021, 4:23am

The primary was shut down due to the issue, and there was a situation where the fail-over had to be done.
If the data had been coming in, would there be a fail-over after replicating the data? Or will fail-over come first, resulting in data loss?

Stennie_X · March 6, 2021, 5:54am

Hi @Kim_Hakseon,

What specific version of MongoDB server are you using and how did you shutdown the primary?

If you used rs.stepdown() there is a secondaryCatchUpPeriodSecs period which waits for an eligible secondary to catch up:

The method does not immediately step down the primary. If no electable secondaries are up to date with the primary, the primary waits up to secondaryCatchUpPeriodSecs (by default 10 seconds) for a secondary to catch up. Once an electable secondary is available, the method steps down the primary.

However, at some point an election may interrupt some in-flight writes.

If you are using a modern version of MongoDB server (3.6+), compatible drivers support retryable writes to retry operations a single time if they encounter a network error or a replica set without a primary.

Regards,
Stennie

Kim_Hakseon · March 6, 2021, 6:02am

I use mongodb 4.4.4, and reason of shutdown is that server was broken by external issue.

Stennie_X · March 8, 2021, 5:10am

Hi @Kim_Hakseon,

Replica set failover should not result in data loss of writes in progress in most scenarios:

If any members of your replica set accepted writes that were not replicated to a new primary, conflicting versions of documents will be exported to a rollback directory for manual reconciliation. See Rollbacks During Replica Set Failover for more detailed information.
If an election happens while writes are in progress, your application can recover using retryable writes. See Retryable Write Operations for operations that are retryable when issues with an acknowledged write concern (w:1 or higher).
The default write concern for most drivers is w:1, but using a majority write concern in your application will minimise the potential of rollbacks:

The more members that acknowledge a write, the less likely the written data could roll back if the primary fails. However, specifying a high write concern can increase latency as the client must wait until it receives the requested level of write concern acknowledgment.

Features like retryable writes and retention of rollback data are enabled by default, but it is possible (although not advisable) to disable those via applicable driver or server configuration changes.

If your cluster has had a failover due to an external issue, I would check if any rollback files have been created on your former primary. It would also be worth reviewing your application code to ensure retryable writes, write concerns, and appropriate exception handling are being used.

Regards,
Stennie

Stennie_X · March 8, 2021, 5:49am

Hi @Kim_Hakseon,

For an unexpected failover the sequence of events would be something like:

Current primary is unavailable and replica set starts election for new primary.
Applications sending writes get an an exception (no current primary) and queue retryable writes (non-retryable writes will get an exception).
New primary gets elected based on the Replication Election Protocol which considers factors like member priority and most recent oplog entry.
Other secondaries resume replication from the new primary and normal writes are now possible.
Applications automatically resend retryable writes and only those which haven’t been applied yet will be executed.
Former primary eventually rejoins as a secondary. Any accepted writes that hadn’t replicated to the current primary are rolled back.

Does that address your follow-up concern?

Regards,
Stennie

system · March 13, 2021, 5:49am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.