- Replication >
- Replica Set Operation and Management
Replica Set Operation and Management¶
On this page
Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets.
See also
rs.status()
anddb.isMaster()
- Replica Set Reconfiguration Process
rs.conf()
andrs.reconfig()
- Replica Set Configuration
The following tutorials provide task-oriented instructions for specific administrative tasks related to replica set operation.
- Deploy a Replica Set
- Convert a Standalone to a Replica Set
- Add Members to a Replica Set
- Deploy a Geographically Distributed Replica Set
- Change the Size of the Oplog
- Force a Member to Become Primary
- Change Hostnames in a Replica Set
- Convert a Secondary to an Arbiter
- Reconfigure a Replica Set with Unavailable Members
- Recover MongoDB Data following Unexpected Shutdown
Member Configurations¶
All replica sets have a single primary and one or more secondaries. Replica sets allow you to configure secondary members in a variety of ways. This section describes these configurations.
Note
A replica set can have up to 12 members, but only 7 members can have votes. For configuration information regarding non-voting members, see Non-Voting Members.
Warning
The rs.reconfig()
shell method can force the current
primary to step down, which causes an election.
When the primary steps down, the mongod
closes all client
connections. While this typically takes 10-20 seconds, attempt to
make these changes during scheduled maintenance periods. To
successfully reconfigure a replica set, a majority of the members
must be accessible.
See also
The Elections section in the Replica Set Fundamental Concepts document, and the Election Internals section in the Replica Set Internals and Behaviors document.
Secondary-Only Members¶
The secondary-only configuration prevents a secondary member in a replica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set except the current primary.
For example, you may want to configure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary.
To configure a member as secondary-only, set its
priority
value to 0
. Any member with a
priority
equal to 0
will never seek
election and cannot become primary in any
situation. For more information on priority levels, see
Member Priority.
Note
When updating the replica configuration object, address all members
of the set using the index value in the array. The array index
begins with 0
. Do not confuse this index value with the value
of the _id
field in each document in the
members
array.
The _id
rarely corresponds to the array
index.
As an example of modifying member priorities, assume a four-member
replica set. Use the following sequence of operations in the
mongo
shell to modify member priorities:
This reconfigures the set, with the following priority settings:
- Member
0
to a priority of2
so that it becomes primary, under most circumstances. - Member
1
to a priority of1
, which is the default value. Member1
becomes primary if no member with a higher priority is eligible. - Member
2
to a priority of0.5
, which makes it less likely to become primary than other members but doesn’t prohibit the possibility. - Member
3
to a priority of0
. Member3
cannot become the primary member under any circumstances.
Note
If your replica set has an even number of members, add an arbiter to ensure that members can quickly obtain a majority of votes in an election for primary.
Note
MongoDB does not permit the current primary to have a
priority
of 0
. If you
want to prevent the current primary from becoming primary, first
use rs.stepDown()
to step down the current primary, and
then reconfigure the replica set with rs.conf()
and
rs.reconfig()
.
See also
Delayed Members¶
Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.
Example
If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.
Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:
- Ensure that the length of the delay is equal to or greater than your maintenance windows.
- The size of the oplog is sufficient to capture more than the number of operations that typically occur in that period of time. For more information on oplog size, see the Oplog topic in the Replica Set Fundamental Concepts document.
Delayed members must have a priority set to 0
to prevent
them from becoming primary in their replica sets. Also these members
should be hidden to prevent your
application from seeing or querying this member.
To configure a replica set member with a one hour delay, use the
following sequence of operations in the mongo
shell:
After the replica set reconfigures, the first member of the set in the
members
array will have a priority
of 0
and cannot become primary. The
slaveDelay
value
delays both replication and the member’s oplog by 3600 seconds (1
hour). Setting slaveDelay
to a
non-zero value also sets hidden
to
true
for this replica set so that it does not receive application
queries in normal operations.
Warning
The length of the secondary
slaveDelay
must
fit within the window of the oplog. If the oplog is shorter than
the slaveDelay
window, the delayed member cannot successfully replicate
operations.
See also
slaveDelay
, Replica Set Reconfiguration, Oplog,
Changing Oplog Size in this document,
and the Change the Size of the Oplog tutorial.
Arbiters¶
Arbiters are special mongod
instances that do not hold a
copy of the data and thus cannot become primary. Arbiters exist solely
to participate in elections.
Note
Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload, such as an application server or monitoring member.
Warning
Do not run arbiter processes on a system that is an active primary or secondary of its replica set.
Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set:
Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.
MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.
Exchanges of replica set configuration data and of votes. These are not encrypted.
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Use MongoDB with SSL Connections for more information. As with all MongoDB components, run arbiters on secure networks.
To add an arbiter, see Adding an Arbiter.
Non-Voting Members¶
You may choose to change the number of votes that each member has in elections for primary. In general, all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities to control which members are more likely to become primary.
To disable a member’s ability to vote in elections, use the following
command sequence in the mongo
shell.
This sequence gives 0
votes to the fourth, fifth, and sixth
members of the set according to the order of the
members
array in the output of
rs.conf()
. This setting allows the set to elect these
members as primary but does not allow them to vote in
elections. If you have three non-voting members, you can add three
additional voting members to your set. Place voting members so that
your designated primary or primaries can reach a majority of votes in
the event of a network partition.
Note
In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks, or the wrong members from becoming primary. Use Replica Set Priorities to control which members are more likely to become primary.
See also
Chained Replication¶
New in version 2.0.
Chained replication occurs when a secondary member replicates from another secondary member instead of from the primary. This might be the case, for example, if a secondary selects its replication target based on ping time and if the closest member is another secondary.
Chained replication can reduce load on the primary. But chained replication can also result in increased replication lag, depending on the topology of the network.
Beginning with version 2.2.4, you can use the
chainingAllowed
setting in
Replica Set Configuration to disable chained replication
for situations where chained replication is causing lag. For details,
see Chained Replication.
Procedures¶
This section gives overview information on a number of replica set administration procedures. You can find documentation of additional procedures in the replica set tutorials section.
Adding Members¶
Before adding a new member to an existing replica set, do one of the following to prepare the new member’s data directory:
Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member.
If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention.
Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current.
Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog. If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to perform an initial sync, which completely resynchronizes the data, as described in Resyncing a Member of a Replica Set.
Use
db.printReplicationInfo()
to check the current state of replica set members with regards to the oplog.
For the procedure to add a member to a replica set, see Add Members to a Replica Set.
Removing Members¶
You may remove a member of a replica set at any time; however, for best
results always shut down the mongod
instance before
removing it from a replica set.
Changed in version 2.2: Before 2.2, you had to shut down the mongod
instance
before removing it. While 2.2 removes this requirement, it remains
good practice.
To remove a member, use the
rs.remove()
method in the mongo
shell while
connected to the current primary. Issue the
db.isMaster()
command when connected to any member of the
set to determine the current primary. Use a command in either
of the following forms to remove the member:
This operation disconnects the shell briefly and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds.
You can re-add a removed member to a replica set at any time using the
procedure for adding replica set members.
Additionally, consider using the replica set reconfiguration procedure to change the
host
value to rename a member in a replica set
directly.
Replacing a Member¶
Use this procedure to replace a member of a replica set when the hostname has changed. This procedure preserves all existing configuration for a member, except its hostname/location.
You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all configured options related to the previous member.
Use rs.reconfig()
to change the value of the
host
field to reflect the new hostname or port
number. rs.reconfig()
will not change the value of
_id
.
Adjusting Priority¶
To change the value of the priority
in the
replica set configuration, use the following sequence of commands in
the mongo
shell:
The first operation uses rs.conf()
to set the local variable
cfg
to the contents of the current replica set configuration, which
is a document. The next three operations change the
priority
value in the cfg
document for the
first three members configured in the members
array. The final operation
calls rs.reconfig()
with the argument of cfg
to initialize
the new configuration.
Note
When updating the replica configuration object, address all members
of the set using the index value in the array. The array index
begins with 0
. Do not confuse this index value with the value
of the _id
field in each document in the
members
array.
The _id
rarely corresponds to the array
index.
If a member has priority
set to 0
, it is
ineligible to become primary and will not seek
election. Hidden members,
delayed members, and
arbiters all have priority
set to 0
.
All members have a priority
equal to 1
by default.
The value of priority
can be any floating point
(i.e. decimal) number between 0
and 1000
. Priorities
are only used to determine the preference in election. The priority
value is used only in relation to other members. With the exception of
members with a priority of 0
, the absolute value of the
priority
value is irrelevant.
Replica sets will preferentially elect and maintain the primary status
of the member with the highest priority
setting.
Warning
Replica set reconfiguration can force the current primary to step down, leading to an election for primary in the replica set. Elections cause the current primary to close all open client connections.
Perform routine replica set reconfiguration during scheduled maintenance windows.
See also
The Replica Reconfiguration Usage example revolves around
changing the priorities of the members
of a replica set.
Adding an Arbiter¶
For a description of arbiters and their purpose in replica sets, see Arbiters.
To prevent tied elections, do not add an arbiter to a set if the set already has an odd number of voting members.
Because arbiters do not hold a copies of collection data, they have minimal resource requirements and do not require dedicated hardware.
Create a data directory for the arbiter. The
mongod
uses this directory for configuration information. It will not hold database collection data. The following example creates the/data/arb
data directory:Start the arbiter, making sure to specify the replica set name and the data directory. Consider the following example:
In a
mongo
shell connected to the primary, add the arbiter to the replica set by issuing thers.addArb()
method, which uses the following syntax:For example, if the arbiter runs on
m1.example.net:30000
, you would issue this command:
Manually Configure a Secondary’s Sync Target¶
To override the default sync target selection logic, you may manually configure a secondary member’s sync target for pulling oplog entries temporarily. The following operations provide access to this functionality:
replSetSyncFrom
command, orrs.syncFrom()
helper in themongo
shell
Only modify the default sync logic as needed, and always exercise
caution. rs.syncFrom()
will not affect an in-progress
initial sync operation. To affect the sync target for the initial sync, run
rs.syncFrom()
operation before initial sync.
If you run rs.syncFrom()
during initial sync, MongoDB
produces no error messages, but the sync target will not change until
after the initial sync operation.
Note
replSetSyncFrom
and rs.syncFrom()
provide a
temporary override of default behavior. If:
- the
mongod
instance restarts or - the connection to the sync target closes;
then, the mongod
instance will revert to the default sync
logic and target.
Manage Chained Replication¶
New in version 2.2.4.
MongoDB enables chained replication by default. This procedure describes how to disable it and how to re-enable it.
To disable chained replication, set the
chainingAllowed
field in Replica Set Configuration to false
.
You can use the following sequence of commands to set
chainingAllowed
to
false
:
Copy the configuration settings into the
cfg
object:Take note of whether the current configuration settings contain the
settings
sub-document. If they do, skip this step.Warning
To avoid data loss, skip this step if the configuration settings contain the
settings
sub-document.If the current configuration settings do not contain the
settings
sub-document, create the sub-document by issuing the following command:Issue the following sequence of commands to set
chainingAllowed
tofalse
:
To re-enable chained replication, set
chainingAllowed
to true
.
You can use the following sequence of commands:
Note
If chained replication is disabled, you still can use
replSetSyncFrom
to specify that a secondary replicates
from another secondary. But that configuration will last only until the
secondary recalculates which member to sync from.
Changing Oplog Size¶
The following is an overview of the procedure for changing the size of the oplog. For a detailed procedure, see Change the Size of the Oplog.
- Shut down the current primary instance in the replica set and then restart it on a different port and in “standalone” mode.
- Create a backup of the old (current) oplog. This is optional.
- Save the last entry from the old oplog.
- Drop the old oplog.
- Create a new oplog of a different size.
- Insert the previously saved last entry from the old oplog into the new oplog.
- Restart the server as a member of the replica set on its usual port.
- Apply this procedure to any other member of the replica set that could become primary.
Resyncing a Member of a Replica Set¶
When a secondary’s replication process falls behind so far that primary overwrites oplog entries that the secondary has not yet replicated, that secondary cannot catch up and becomes “stale.” When that occurs, you must completely resynchronize the member by removing its data and performing an initial sync.
To do so, use one of the following approaches:
Restart the
mongod
with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. This is the more simple option, but may take longer to replace the data.Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
Automatically Resync a Stale Member¶
This procedure relies on MongoDB’s regular process for initial sync. This will restore the data on the stale member to reflect the current state of the set. For an overview of MongoDB initial sync process, see the Syncing section.
To resync the stale member:
Stop the stale member’s
mongod
instance. On Linux systems you can usemongod --shutdown
Set--dbpath
to the member’s data directory, as in the following:Delete all data and sub-directories from the member’s data directory. By removing the data
dbpath
, MongoDB will perform a complete resync. Consider making a backup first.Restart the
mongod
instance on the member. For example:At this point, the
mongod
will perform an initial sync. The length of the initial sync may process depends on the size of the database and network connection between members of the replica set.Initial sync operations can impact the other members of the set and create additional traffic to the primary, and can only occur if another member of the set is accessible and up to date.
Resync by Copying All Datafiles from Another Member¶
This approach uses a copy of the data files from an existing member of the replica set, or a back of the data files to “seed” the stale member.
The copy or backup of the data files must be sufficiently recent to allow the new member to catch up with the oplog, otherwise the member would need to perform an initial sync.
Note
In most cases you cannot copy data files from a running
mongod
instance to another, because the data files will
change during the file copy operation. Consider the
Backup Strategies for MongoDB Systems documentation for several methods
that you can use to capture a consistent snapshot of a running
mongod
instance.
After you have copied the data files from the “seed” source, start the
mongod
instance and allow it to apply all operations from
the oplog until it reflects the current state of the replica set.
Security Considerations for Replica Sets¶
In most cases, the most effective ways to control access and to secure
the connection between members of a replica set depend on
network-level access control. Use your environment’s firewall and
network routing to ensure that traffic only from clients and other
replica set members can reach your mongod
instances. If needed,
use virtual private networks (VPNs) to ensure secure connections
over wide area networks (WANs.)
Additionally, MongoDB provides an authentication mechanism for
mongod
and mongos
instances connecting to
replica sets. These instances enable authentication but specify a
shared key file that serves as a shared password.
New in version 1.8: Added support authentication in replica set deployments.
Changed in version 1.9.1: Added support authentication in sharded replica set deployments.
To enable authentication add the following option to your configuration file:
Note
You may chose to set these run-time configuration options using the
--keyFile
(or mongos --keyFile
)
options on the command line.
Setting keyFile
enables authentication and specifies a key
file for the replica set members to use when authenticating to each
other. The content of the key file is arbitrary but must be the same
on all members of the replica set and on all mongos
instances that connect to the set.
The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:
Note
Key file permissions are not checked on Windows systems.
Troubleshooting Replica Sets¶
This section describes common strategies for troubleshooting replica sets.
See also
Check Replica Set Status¶
To display the current state of the replica set and current state of
each member, run the rs.status()
method in a mongo
shell connected to the replica set’s primary. For descriptions
of the information displayed by rs.status()
, see
Replica Set Status Reference.
Note
The rs.status()
method is a wrapper that runs the
replSetGetStatus
database command.
Check the Replication Lag¶
Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.
To check the current length of replication lag:
In a
mongo
shell connected to the primary, call thedb.printSlaveReplicationInfo()
method.The returned document displays the
syncedTo
value for each member, which shows you when each member last read from the oplog, as shown in the following example:Note
The
rs.status()
method is a wrapper around thereplSetGetStatus
database command.Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Monitoring Service. For more information see the documentation for MMS.
Possible causes of replication lag include:
Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.
Use tools including
ping
to test latency between set members andtraceroute
to expose the routing of packets network endpoints.Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)
Use system-level tools to assess disk status, including
iostat
orvmstat
.Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in Write Concern. This prevents write operations from returning if replication cannot keep up with the write load.
Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.
Appropriate Write Concern
If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.
To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.
For more information see:
Test Connections Between all Members¶
All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking topologies and firewall configurations prevent normal and required connectivity, which can block replication.
Consider the following example of a bidirectional test of networking:
Example
Given a replica set with three members running on three separate hosts:
m1.example.net
m2.example.net
m3.example.net
Test the connection from
m1.example.net
to the other hosts with the following operation setm1.example.net
:Test the connection from
m2.example.net
to the other two hosts with the following operation set fromm2.example.net
, as in:You have now tested the connection between
m2.example.net
andm1.example.net
in both directions.Test the connection from
m3.example.net
to the other two hosts with the following operation set from them3.example.net
host, as in:
If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.
Check the Size of the Oplog¶
A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.
To check the size of the oplog for a given replica set member,
connect to the member in a mongo
shell and run the
db.printReplicationInfo()
method.
The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:
The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.
For more information on how oplog size affects operations, see:
- The Oplog topic in the Replica Set Fundamental Concepts document.
- The Delayed Members topic in this document.
- The Check the Replication Lag topic in this document.
Note
You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.
To change oplog size, see Changing Oplog Size in this document or see the Change the Size of the Oplog tutorial.
Failover and Recovery¶
Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.
While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.
In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:
- No remaining member is able to form a majority. This can happen as a result of network partitions that render some members inaccessible. Design your deployment to ensure that a majority of set members can elect a primary in the same facility as core application systems.
- No member is eligible to become primary. Members must have a
priority
setting greater than0
, have a state that is less than ten seconds behind the last operation to the replica set, and generally be more up to date than the voting members.
In many senses, rollbacks represent a graceful recovery from an impossible failover and recovery situation.
Rollbacks occur when
a primary accepts writes that other members of
the set do not successfully replicate before the primary steps
down. When the former primary begins replicating again it performs a
“rollback.” Rollbacks remove those operations from the instance that
were never replicated to the set so that the data set is in a
consistent state. The mongod
program writes rolled back
data to a BSON file that you can view using
bsondump
, applied manually using mongorestore
.
You can prevent rollbacks using a replica acknowledged write concern. These write operations require not only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the write operation before returning.
enabling write concern.
See also
The Elections section in the Replica Set Fundamental Concepts document, and the Election Internals section in the Replica Set Internals and Behaviors document.
Oplog Entry Timestamp Error¶
Consider the following error in mongod
output and logs:
Often, an incorrectly typed value in the ts
field in the last
oplog entry causes this error. The correct data type is
Timestamp.
Check the type of the ts
value using the following two queries
against the oplog collection:
The first query returns the last document in the oplog, while the
second returns the last document in the oplog where the ts
value
is a Timestamp. The $type
operator allows you to select
BSON type 17, is the Timestamp data type.
If the queries don’t return the same document, then the last document in
the oplog has the wrong data type in the ts
field.
Example
If the first query returns this as the last oplog entry:
And the second query returns this as the last entry where ts
has the Timestamp
type:
Then the value for the ts
field in the last oplog entry is of the
wrong data type.
To set the proper type for this value and resolve this issue, use an update operation that resembles the following:
Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.
Duplicate Key Error on local.slaves
¶
The duplicate key on local.slaves error, occurs when a
secondary or slave changes its hostname and the
primary or master tries to update its local.slaves
collection with the new name. The update fails because it contains the
same _id
value as the document containing the previous hostname. The
error itself will resemble the following.
This is a benign error and does not affect replication operations on the secondary or slave.
To prevent the error from appearing, drop the local.slaves
collection from the primary or master, with the
following sequence of operations in the mongo
shell:
The next time a secondary or slave polls the
primary or master, the primary or master
recreates the local.slaves
collection.
Elections and Network Partitions¶
Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.
That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only.
See