Starting in version 4.0, MongoDB only supports replica set protocol
version 1 (pv1). pv1 is the default for
all new replica sets created with MongoDB 3.2 or later.
Preservation of Writes
w:1 Writes
With pv1, you can use
catchUpTimeoutMillis to prioritize between faster
failovers and preservation of w:1 writes.
w: "majority" Writes
pv1 guarantees the preservation of confirmed w:
"majority" writes.
Availability
pv1 is available in MongoDB version 3.2 or later and is the default
for all new replica sets created with version 3.2 or later.
Arbiters
For the following MongoDB versions, pv1 increases the likelihood
of w:1 rollbacks compared to pv0
(no longer supported in MongoDB 4.0+) for replica sets with arbiters:
MongoDB 3.4.1
MongoDB 3.4.0
MongoDB 3.2.11 or earlier
For the other versions of MongoDB that support pv1, pv1 does
not increase the likelihood of w:1 rollbacks for
replica sets with arbiters.
Priorities
For the following MongoDB versions, pv1 increases the likelihood
of w:1 rollbacks compared to pv0
(no longer supported in MongoDB 4.0+) for replica sets with different
members[n].priority settings:
MongoDB 3.4.1
MongoDB 3.4.0
MongoDB 3.2.11 or earlier
For the other versions of MongoDB that support pv1, pv1 does
not increase the likelihood of w:1 rollbacks for
replica sets with different members[n].priority settings.
Vetoes
pv1 does not use vetoes. Individual members can vote for or against
a candidate in a particular election, but cannot individually veto (abort)
an election unilaterally.
Detection of Simultaneous Primaries
In some circumstances, two nodes in a replica set
may transiently believe that they are the primary, but at most, one
of them will be able to complete writes with { w:
"majority" } write concern. The node that can complete
{ w: "majority" } writes is the current
primary, and the other node is a former primary that has not yet
recognized its demotion, typically due to a network partition.
When this occurs, clients that connect to the former primary may
observe stale data despite having requested read preference
primary, and new writes to the former primary will
eventually roll back.
pv1 uses the concept of term. This allows for a faster
detection of simultaneous primaries and for multiple successful
elections in a short period of time.
Back to Back Elections
pv1 makes a "best-effort" attempt to have the secondary with the
highest priority available call an election. This
could lead to back-to-back elections as eligible members with
higher priority can call an election.
However, in MongoDB 3.6+ (as well as MongoDB 3.4.2+ and 3.2.12+), for pv1:
Priority elections have been limited to occur only if the higher priority node is within 10 seconds of the current primary.
Arbiters will vote no in elections if they detect a healthy primary of equal or greater priority to the candidate.
Double Voting
pv1 prevents double voting in one member's call for election. This
is achieved through its use of terms.
Modify Replica Set Protocol Version
Starting in version 4.0, MongoDB only supports replica set protocol
version 1 (pv1).
However, MongoDB 3.2 through MongoDB 3.6 support replica set protocol
version 1 and protocol version 0.
Before changing the protocol version for MongoDB 3.2 through MongoDB
3.6, ensure that at least one oplog entry (generated from the current
protocol version) has replicated from the primary to all secondaries.
To check, on each secondary, check the optimes.lastCommittedOpTime.t field returned from
rs.status(). For example, connect a mongo
shell to each secondary and run:
rs.status().optimes.lastCommittedOpTime.t
If the current replica set protocol version is
0, thetis equal to-1.If the current replica set protocol version is
1, thetis greater than-1.
Once you have verified that at least one oplog entry (using the current protocol version) has replicated to all the secondaries, you can change the protocol version.
To change the replica set protocol version, reconfigure
(rs.reconfig()) the replica set with the new
protocolVersion. For example, to upgrade to pv1, connect
a mongo shell to the current primary and perform the
following sequence of operations:
cfg = rs.conf(); cfg.protocolVersion=1; rs.reconfig(cfg);
You can use catchUpTimeoutMillis to prioritize
between faster failovers and preservation of w:1 writes.