Replica sets occasionally run elections when a primary steps down or becomes unavailable. During an election, the replica set cannot accept writes until it successfully elects a new primary, though most reads can continue on secondaries if configured.
Expect this behavior during rare failovers or planned maintenance. However, frequent elections, where the primary changes often during normal operation, cause repeated write interruptions and, in some cases, rollbacks of uncommitted data.
This page contains common issues and resolutions for frequent elections to ensure fast diagnosis and avoid application write interruptions. If you need additional support after reviewing the following sections, contact Technical Support.
Prerequisite Checks
Healthy clusters see elections only during infrequent, expected events. Elections normally happen during the following scenarios:
Initial setup
Maintenance operations, such as
rs.stepDown()orrs.reconfig()New member additions with
rs.add()Primary unavailability for more than the configured
timeout, where the default is 10 seconds
Verify that your deployment experiences frequent elections
outside of these scenarios by running the replSetGetStatus
or rs.status() method multiple times over a set period, such as
within an hour or throughout the day. Compare the reported
primary's _id value each time to track if and when the
primary node changes, meaning an election occurred.
Note
If you see a TOO_MANY_ELECTIONS
alert in Ops Manager, you are
likely experiencing frequent elections.
Check Log Messages
You can also refer to your deployment's log messages
for entries where the component ("c") value is ELECTION.
If your logs display multiple occurrences of the following messages over
short intervals, such as multiple times per hour or day without
planned maintenance, this typically indicates an unhealthy
replica set caused by network instability, unhealthy hosts, or
misconfiguration:
Message | Description |
|---|---|
"Starting an election, since we've seen no PRIMARY in election timeout period" | Logged by other members when the primary steps down. |
"transition to PRIMARY from SECONDARY" | A new primary takes over. |
"can't see a majority of the set, relinquishing primary" | The previous primary becomes a secondary. |
Common Issues and Resolutions
The following section describes common issues that can cause a replica set to perform unexpected frequent elections. Before you contact support, check if the following pitfalls cause frequent elections in your replica set.
Resource Exhaustion
Intensive queries, large aggregations, and background tasks like index builds or backups can lead to high CPU usage, disk latency or failures, and memory pressure. Resource exhaustion can cause frequent elections because the primary can become unresponsive or unable to process heartbeats on time.
Inefficient Queries
To check if your deployment experiences resource exhaustion due to inefficient queries:
Look for entries in your logs where the component ("c") value is
COMMAND.For each entry, the
bytesReadfield indicates how many bytes a given command reads. Pay attention to commands with largebytesReadvalues.If the timestamps for your inefficient queries occur near the timestamps for multiple elections, inefficient queries are likely causing your frequent elections.
Use the following resources to optimize your queries:
Handle your workload more efficiently using Compound Indexes.
Review our ESR Guideline to create new indexes.
In Atlas, review your performance advisor regularly for index suggestions based on your latest workload.
Replica Set Misconfiguration
Node Priority
Replica sets continuously call elections until they elect the member with the highest priority. If you do not set member priorities appropriately, elections can occur more often, with unexpected members becoming primary.
To check the priority values of each member, run the
replSetGetConfig command or rs.conf() method.
The following example shows the output of the rs.conf() method
for a replica set with incorrectly configured priorities:
rs.conf().members
[ { _id: 0, host: "rs0-0.example.net:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1, tags: { dc: "primaryDC" }, secondaryDelaySecs: Long(0), votes: 1 }, { _id: 1, host: "rs0-1.example.net:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 10, tags: { dc: "remoteDC1" }, secondaryDelaySecs: Long(0), votes: 1 }, { _id: 2, host: "rs0-2.example.net:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 9, tags: { dc: "remoteDC2" }, secondaryDelaySecs: Long(0), votes: 1 }, { _id: 3, host: "rs0-3.example.net:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 8, tags: { dc: "analyticsDC" }, secondaryDelaySecs: Long(0), votes: 1 } ]
In the above example, the current primary has lower priority than the three other members. If a high-priority secondary is healthy and in SECONDARY state, it can trigger an election to take over as primary.
Ensure you configure the priorities for your replica set members appropriately:
Assign the highest priority to the server you want to consistently serve as the primary
Assign default or lower priorities to other members to reduce their likelihood of becoming primary
Assign nodes with high replication lag low priority
Replica Set Settings
To check your replica set configuration settings, run the
replSetGetConfig command or rs.conf() method.
The following example shows a replica set with electionTimeoutMillis
configured too low:
rs.conf().settings
{ chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 1500, // lower than a single heartbeat cycle catchUpTimeoutMillis: 2000, getLastErrorModes: { }, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId("58858acc1f5609ed986b641b") }
Ensure your value for settings.electionTimeoutMillis is not too low.
In the above example, the settings.electionTimeoutMillis value is
lower than settings.heartbeatIntervalMillis. This means a node
can declare the primary "down" before a full heartbeat interval completes,
causing unnecessary elections.
Network Partitioning or Latency
If your deployment experiences network partition or your nodes experience delayed heartbeat messages, secondaries can incorrectly view the primary as unavailable and initiate elections.
To verify that you are experiencing network partitions:
Check for common occurrences of the following messages in your logs:
MessageDescription"Starting an election, since we've seen no PRIMARY in election timeout period"
A secondary initiated an election because it did not receive heartbeats from the primary within the configured timeout window.
"can't see a majority of the set, relinquishing primary"
The primary stepped down because it cannot communicate with a majority of the voting secondaries.
Run
rs.status()orreplSetGetStatusfrom different nodes to show differing views of which members are reachable, which indicates a split between subsets of nodes.
To help restore connectivity after a partition:
Check your firewalls configurations for any rules that can block communication between members.
Check DNS hostnames.
Ensure that you add your IP address to your IP Access List.
Verify Resolution
After you address the root cause, confirm that frequent
elections no longer occur by re-running
rs.status(). The output shows exactly one member in the
primary state, and that the primary remains stable over your
normal observation window without unplanned changes.
You can also consult your deployment logs. Look for the log messages listed above in your deployment logs to ensure that elections do not occur multiple times over short intervals.
Diagnostics to Collect for More Support
If you still aren't able to resolve your issue, contact Technical Support with the following diagnostic information:
Relevant log messages over the affected time period
rs.config()outputrs.status()output