Navigation
This version of the documentation is archived and no longer supported.

Replica Set Operation and Management

Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets.

Member Configurations

All replica sets have a single primary and one or more secondaries. Replica sets allow you to configure secondary members in a variety of ways. This section describes these configurations.

Note

A replica set can have up to 12 members, but only 7 members can have votes. For configuration information regarding non-voting members, see Non-Voting Members.

Warning

The rs.reconfig() shell method can force the current primary to step down, which causes an election. When the primary steps down, the mongod closes all client connections. While this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. To successfully reconfigure a replica set, a majority of the members must be accessible.

See also

The Elections section in the Replica Set Fundamental Concepts document, and the Election Internals section in the Replica Set Internals and Behaviors document.

Secondary-Only Members

The secondary-only configuration prevents a secondary member in a replica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set except the current primary.

For example, you may want to configure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary.

To configure a member as secondary-only, set its priority value to 0. Any member with a priority equal to 0 will never seek election and cannot become primary in any situation. For more information on priority levels, see Member Priority.

Note

When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.

The _id rarely corresponds to the array index.

As an example of modifying member priorities, assume a four-member replica set. Use the following sequence of operations in the mongo shell to modify member priorities:

cfg = rs.conf()
cfg.members[0].priority = 2
cfg.members[1].priority = 1
cfg.members[2].priority = 0.5
cfg.members[3].priority = 0
rs.reconfig(cfg)

This reconfigures the set, with the following priority settings:

  • Member 0 to a priority of 2 so that it becomes primary, under most circumstances.
  • Member 1 to a priority of 1, which is the default value. Member 1 becomes primary if no member with a higher priority is eligible.
  • Member 2 to a priority of 0.5, which makes it less likely to become primary than other members but doesn’t prohibit the possibility.
  • Member 3 to a priority of 0. Member 3 cannot become the primary member under any circumstances.

Note

If your replica set has an even number of members, add an arbiter to ensure that members can quickly obtain a majority of votes in an election for primary.

Note

MongoDB does not permit the current primary to have a priority of 0. If you want to prevent the current primary from becoming primary, first use rs.stepDown() to step down the current primary, and then reconfigure the replica set with rs.conf() and rs.reconfig().

Hidden Members

Hidden members are part of a replica set but cannot become primary and are invisible to client applications. However, hidden members do vote in elections.

Hidden members are ideal for instances that will have significantly different usage patterns than the other members and require separation from normal traffic. Typically, hidden members provide reporting, dedicated backups, and dedicated read-only testing and integration support.

Hidden members have priority set 0 and have hidden set to true.

To configure a hidden member, use the following sequence of operations in the mongo shell:

cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].hidden = true
rs.reconfig(cfg)

After re-configuring the set, the first member of the set in the members array will have a priority of 0 so that it cannot become primary. The other members in the set will not advertise the hidden member in the isMaster or db.isMaster() output.

Note

You must send the rs.reconfig() command to a set member that can become primary. In the above example, if you issue the rs.reconfig() operation to a member with a priority of 0 the operation will fail.

Note

Changed in version 2.0.

For sharded clusters running with replica sets before 2.0 if you reconfigured a member as hidden, you had to restart mongos to prevent queries from reaching the hidden member.

Delayed Members

Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.

Example

If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.

Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:

  • Ensure that the length of the delay is equal to or greater than your maintenance windows.
  • The size of the oplog is sufficient to capture more than the number of operations that typically occur in that period of time. For more information on oplog size, see the Oplog topic in the Replica Set Fundamental Concepts document.

Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden to prevent your application from seeing or querying this member.

To configure a replica set member with a one hour delay, use the following sequence of operations in the mongo shell:

cfg = rs.conf()
cfg.members[0].priority = 0
cfg.members[0].slaveDelay = 3600
rs.reconfig(cfg)

After the replica set reconfigures, the first member of the set in the members array will have a priority of 0 and cannot become primary. The slaveDelay value delays both replication and the member’s oplog by 3600 seconds (1 hour). Setting slaveDelay to a non-zero value also sets hidden to true for this replica set so that it does not receive application queries in normal operations.

Warning

The length of the secondary slaveDelay must fit within the window of the oplog. If the oplog is shorter than the slaveDelay window, the delayed member cannot successfully replicate operations.

Arbiters

Arbiters are special mongod instances that do not hold a copy of the data and thus cannot become primary. Arbiters exist solely to participate in elections.

Note

Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload, such as an application server or monitoring member.

Warning

Do not run arbiter processes on a system that is an active primary or secondary of its replica set.

Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set:

  • Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.

    MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.

  • Exchanges of replica set configuration data and of votes. These are not encrypted.

If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation for Use MongoDB with SSL Connections for more information. As with all MongoDB components, run arbiters on secure networks.

To add an arbiter, see Adding an Arbiter.

Non-Voting Members

You may choose to change the number of votes that each member has in elections for primary. In general, all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities to control which members are more likely to become primary.

To disable a member’s ability to vote in elections, use the following command sequence in the mongo shell.

cfg = rs.conf()
cfg.members[3].votes = 0
cfg.members[4].votes = 0
cfg.members[5].votes = 0
rs.reconfig(cfg)

This sequence gives 0 votes to the fourth, fifth, and sixth members of the set according to the order of the members array in the output of rs.conf(). This setting allows the set to elect these members as primary but does not allow them to vote in elections. If you have three non-voting members, you can add three additional voting members to your set. Place voting members so that your designated primary or primaries can reach a majority of votes in the event of a network partition.

Note

In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks, or the wrong members from becoming primary. Use Replica Set Priorities to control which members are more likely to become primary.

Chained Replication

New in version 2.0.

Chained replication occurs when a secondary member replicates from another secondary member instead of from the primary. This might be the case, for example, if a secondary selects its replication target based on ping time and if the closest member is another secondary.

Chained replication can reduce load on the primary. But chained replication can also result in increased replication lag, depending on the topology of the network.

Beginning with version 2.2.4, you can use the chainingAllowed setting in Replica Set Configuration to disable chained replication for situations where chained replication is causing lag. For details, see Chained Replication.

Procedures

This section gives overview information on a number of replica set administration procedures. You can find documentation of additional procedures in the replica set tutorials section.

Adding Members

Before adding a new member to an existing replica set, do one of the following to prepare the new member’s data directory:

  • Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member.

    If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention.

  • Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current.

    Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog. If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to perform an initial sync, which completely resynchronizes the data, as described in Resyncing a Member of a Replica Set.

    Use db.printReplicationInfo() to check the current state of replica set members with regards to the oplog.

For the procedure to add a member to a replica set, see Add Members to a Replica Set.

Removing Members

You may remove a member of a replica set at any time; however, for best results always shut down the mongod instance before removing it from a replica set.

Changed in version 2.2: Before 2.2, you had to shut down the mongod instance before removing it. While 2.2 removes this requirement, it remains good practice.

To remove a member, use the rs.remove() method in the mongo shell while connected to the current primary. Issue the db.isMaster() command when connected to any member of the set to determine the current primary. Use a command in either of the following forms to remove the member:

rs.remove("mongo2.example.net:27017")
rs.remove("mongo3.example.net")

This operation disconnects the shell briefly and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds.

You can re-add a removed member to a replica set at any time using the procedure for adding replica set members. Additionally, consider using the replica set reconfiguration procedure to change the host value to rename a member in a replica set directly.

Replacing a Member

Use this procedure to replace a member of a replica set when the hostname has changed. This procedure preserves all existing configuration for a member, except its hostname/location.

You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all configured options related to the previous member.

Use rs.reconfig() to change the value of the host field to reflect the new hostname or port number. rs.reconfig() will not change the value of _id.

cfg = rs.conf()
cfg.members[0].host = "mongo2.example.net:27019"
rs.reconfig(cfg)

Warning

Any replica set configuration change can trigger the current primary to step down, which forces an election. This causes the current shell session, and clients connected to this replica set, to produce an error even when the operation succeeds.

Adjusting Priority

To change the value of the priority in the replica set configuration, use the following sequence of commands in the mongo shell:

cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 2
cfg.members[2].priority = 2
rs.reconfig(cfg)

The first operation uses rs.conf() to set the local variable cfg to the contents of the current replica set configuration, which is a document. The next three operations change the priority value in the cfg document for the first three members configured in the members array. The final operation calls rs.reconfig() with the argument of cfg to initialize the new configuration.

Note

When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.

The _id rarely corresponds to the array index.

If a member has priority set to 0, it is ineligible to become primary and will not seek election. Hidden members, delayed members, and arbiters all have priority set to 0.

All members have a priority equal to 1 by default.

The value of priority can be any floating point (i.e. decimal) number between 0 and 1000. Priorities are only used to determine the preference in election. The priority value is used only in relation to other members. With the exception of members with a priority of 0, the absolute value of the priority value is irrelevant.

Replica sets will preferentially elect and maintain the primary status of the member with the highest priority setting.

Warning

Replica set reconfiguration can force the current primary to step down, leading to an election for primary in the replica set. Elections cause the current primary to close all open client connections.

Perform routine replica set reconfiguration during scheduled maintenance windows.

See also

The Replica Reconfiguration Usage example revolves around changing the priorities of the members of a replica set.

Adding an Arbiter

For a description of arbiters and their purpose in replica sets, see Arbiters.

To prevent tied elections, do not add an arbiter to a set if the set already has an odd number of voting members.

Because arbiters do not hold a copies of collection data, they have minimal resource requirements and do not require dedicated hardware.

  1. Create a data directory for the arbiter. The mongod uses this directory for configuration information. It will not hold database collection data. The following example creates the /data/arb data directory:

    mkdir /data/arb
    
  2. Start the arbiter, making sure to specify the replica set name and the data directory. Consider the following example:

    mongod --port 30000 --dbpath /data/arb --replSet rs
    
  3. In a mongo shell connected to the primary, add the arbiter to the replica set by issuing the rs.addArb() method, which uses the following syntax:

    rs.addArb("<hostname><:port>")
    

    For example, if the arbiter runs on m1.example.net:30000, you would issue this command:

    rs.addArb("m1.example.net:30000")
    

Manually Configure a Secondary’s Sync Target

To override the default sync target selection logic, you may manually configure a secondary member’s sync target for pulling oplog entries temporarily. The following operations provide access to this functionality:

Only modify the default sync logic as needed, and always exercise caution. rs.syncFrom() will not affect an in-progress initial sync operation. To affect the sync target for the initial sync, run rs.syncFrom() operation before initial sync.

If you run rs.syncFrom() during initial sync, MongoDB produces no error messages, but the sync target will not change until after the initial sync operation.

Note

replSetSyncFrom and rs.syncFrom() provide a temporary override of default behavior. If:

  • the mongod instance restarts or
  • the connection to the sync target closes;

then, the mongod instance will revert to the default sync logic and target.

Manage Chained Replication

New in version 2.2.4.

MongoDB enables chained replication by default. This procedure describes how to disable it and how to re-enable it.

To disable chained replication, set the chainingAllowed field in Replica Set Configuration to false.

You can use the following sequence of commands to set chainingAllowed to false:

  1. Copy the configuration settings into the cfg object:

    cfg = rs.config()
    
  2. Take note of whether the current configuration settings contain the settings sub-document. If they do, skip this step.

    Warning

    To avoid data loss, skip this step if the configuration settings contain the settings sub-document.

    If the current configuration settings do not contain the settings sub-document, create the sub-document by issuing the following command:

    cfg.settings = { }
    
  3. Issue the following sequence of commands to set chainingAllowed to false:

    cfg.settings.chainingAllowed = false
    rs.reconfig(cfg)
    

To re-enable chained replication, set chainingAllowed to true. You can use the following sequence of commands:

cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

Note

If chained replication is disabled, you still can use replSetSyncFrom to specify that a secondary replicates from another secondary. But that configuration will last only until the secondary recalculates which member to sync from.

Changing Oplog Size

The following is an overview of the procedure for changing the size of the oplog. For a detailed procedure, see Change the Size of the Oplog.

  1. Shut down the current primary instance in the replica set and then restart it on a different port and in “standalone” mode.
  2. Create a backup of the old (current) oplog. This is optional.
  3. Save the last entry from the old oplog.
  4. Drop the old oplog.
  5. Create a new oplog of a different size.
  6. Insert the previously saved last entry from the old oplog into the new oplog.
  7. Restart the server as a member of the replica set on its usual port.
  8. Apply this procedure to any other member of the replica set that could become primary.

Resyncing a Member of a Replica Set

When a secondary’s replication process falls behind so far that primary overwrites oplog entries that the secondary has not yet replicated, that secondary cannot catch up and becomes “stale.” When that occurs, you must completely resynchronize the member by removing its data and performing an initial sync.

To do so, use one of the following approaches:

Automatically Resync a Stale Member

This procedure relies on MongoDB’s regular process for initial sync. This will restore the data on the stale member to reflect the current state of the set. For an overview of MongoDB initial sync process, see the Syncing section.

To resync the stale member:

  1. Stop the stale member’s mongod instance. On Linux systems you can use mongod --shutdown Set --dbpath to the member’s data directory, as in the following:

    mongod --dbpath /data/db/ --shutdown
    
  2. Delete all data and sub-directories from the member’s data directory. By removing the data dbpath, MongoDB will perform a complete resync. Consider making a backup first.

  3. Restart the mongod instance on the member. For example:

    mongod --dbpath /data/db/ --replSet rsProduction
    

    At this point, the mongod will perform an initial sync. The length of the initial sync may process depends on the size of the database and network connection between members of the replica set.

    Initial sync operations can impact the other members of the set and create additional traffic to the primary, and can only occur if another member of the set is accessible and up to date.

Resync by Copying All Datafiles from Another Member

This approach uses a copy of the data files from an existing member of the replica set, or a back of the data files to “seed” the stale member.

The copy or backup of the data files must be sufficiently recent to allow the new member to catch up with the oplog, otherwise the member would need to perform an initial sync.

Note

In most cases you cannot copy data files from a running mongod instance to another, because the data files will change during the file copy operation. Consider the Backup Strategies for MongoDB Systems documentation for several methods that you can use to capture a consistent snapshot of a running mongod instance.

After you have copied the data files from the “seed” source, start the mongod instance and allow it to apply all operations from the oplog until it reflects the current state of the replica set.

Security Considerations for Replica Sets

In most cases, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environment’s firewall and network routing to ensure that traffic only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.)

Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key file that serves as a shared password.

New in version 1.8: Added support authentication in replica set deployments.

Changed in version 1.9.1: Added support authentication in sharded replica set deployments.

To enable authentication add the following option to your configuration file:

keyFile = /srv/mongodb/keyfile

Note

You may chose to set these run-time configuration options using the --keyFile (or mongos --keyFile) options on the command line.

Setting keyFile enables authentication and specifies a key file for the replica set members to use when authenticating to each other. The content of the key file is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set.

The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:

openssl rand -base64 753

Note

Key file permissions are not checked on Windows systems.

Troubleshooting Replica Sets

This section describes common strategies for troubleshooting replica sets.

Check Replica Set Status

To display the current state of the replica set and current state of each member, run the rs.status() method in a mongo shell connected to the replica set’s primary. For descriptions of the information displayed by rs.status(), see Replica Set Status Reference.

Note

The rs.status() method is a wrapper that runs the replSetGetStatus database command.

Check the Replication Lag

Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments. Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.

To check the current length of replication lag:

  • In a mongo shell connected to the primary, call the db.printSlaveReplicationInfo() method.

    The returned document displays the syncedTo value for each member, which shows you when each member last read from the oplog, as shown in the following example:

    source:   m1.example.net:30001
        syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT)
            = 7475 secs ago (2.08hrs)
    source:   m2.example.net:30002
        syncedTo: Tue Oct 02 2012 11:33:40 GMT-0400 (EDT)
            = 7475 secs ago (2.08hrs)
    

    Note

    The rs.status() method is a wrapper around the replSetGetStatus database command.

  • Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Monitoring Service. For more information see the documentation for MMS.

Possible causes of replication lag include:

  • Network Latency

    Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.

    Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.

  • Disk Throughput

    If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary, then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)

    Use system-level tools to assess disk status, including iostat or vmstat.

  • Concurrency

    In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in Write Concern. This prevents write operations from returning if replication cannot keep up with the write load.

    Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.

  • Appropriate Write Concern

    If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.

    To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.

    For more information see:

Test Connections Between all Members

All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking topologies and firewall configurations prevent normal and required connectivity, which can block replication.

Consider the following example of a bidirectional test of networking:

Example

Given a replica set with three members running on three separate hosts:

  • m1.example.net
  • m2.example.net
  • m3.example.net
  1. Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:

    mongo --host m2.example.net --port 27017
    
    mongo --host m3.example.net --port 27017
    
  2. Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:

    mongo --host m1.example.net --port 27017
    
    mongo --host m3.example.net --port 27017
    

    You have now tested the connection between m2.example.net and m1.example.net in both directions.

  3. Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.net host, as in:

    mongo --host m1.example.net --port 27017
    
    mongo --host m2.example.net --port 27017
    

If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.

Check the Size of the Oplog

A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.

To check the size of the oplog for a given replica set member, connect to the member in a mongo shell and run the db.printReplicationInfo() method.

The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:

configured oplog size:   10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time:  Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time:   Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now:                     Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)

The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.

For more information on how oplog size affects operations, see:

Note

You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.

To change oplog size, see Changing Oplog Size in this document or see the Change the Size of the Oplog tutorial.

Failover and Recovery

Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.

While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.

In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:

  • No remaining member is able to form a majority. This can happen as a result of network partitions that render some members inaccessible. Design your deployment to ensure that a majority of set members can elect a primary in the same facility as core application systems.
  • No member is eligible to become primary. Members must have a priority setting greater than 0, have a state that is less than ten seconds behind the last operation to the replica set, and generally be more up to date than the voting members.

In many senses, rollbacks represent a graceful recovery from an impossible failover and recovery situation.

Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before the primary steps down. When the former primary begins replicating again it performs a “rollback.” Rollbacks remove those operations from the instance that were never replicated to the set so that the data set is in a consistent state. The mongod program writes rolled back data to a BSON file that you can view using bsondump, applied manually using mongorestore.

You can prevent rollbacks using a replica acknowledged write concern. These write operations require not only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the write operation before returning.

enabling write concern.

See also

The Elections section in the Replica Set Fundamental Concepts document, and the Election Internals section in the Replica Set Internals and Behaviors document.

Oplog Entry Timestamp Error

Consider the following error in mongod output and logs:

replSet error fatal couldn't query the local local.oplog.rs collection.  Terminating mongod after 30 seconds.
<timestamp> [rsStart] bad replSet oplog entry?

Often, an incorrectly typed value in the ts field in the last oplog entry causes this error. The correct data type is Timestamp.

Check the type of the ts value using the following two queries against the oplog collection:

db = db.getSiblingDB("local")
db.oplog.rs.find().sort({$natural:-1}).limit(1)
db.oplog.rs.find({ts:{$type:17}}).sort({$natural:-1}).limit(1)

The first query returns the last document in the oplog, while the second returns the last document in the oplog where the ts value is a Timestamp. The $type operator allows you to select BSON type 17, is the Timestamp data type.

If the queries don’t return the same document, then the last document in the oplog has the wrong data type in the ts field.

Example

If the first query returns this as the last oplog entry:

{ "ts" : {t: 1347982456000, i: 1},
  "h" : NumberLong("8191276672478122996"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 4 } }

And the second query returns this as the last entry where ts has the Timestamp type:

{ "ts" : Timestamp(1347982454000, 1),
  "h" : NumberLong("6188469075153256465"),
  "op" : "n",
  "ns" : "",
  "o" : { "msg" : "Reconfig set", "version" : 3 } }

Then the value for the ts field in the last oplog entry is of the wrong data type.

To set the proper type for this value and resolve this issue, use an update operation that resembles the following:

db.oplog.rs.update( { ts: { t:1347982456000, i:1 } },
                    { $set: { ts: new Timestamp(1347982456000, 1)}})

Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.

Duplicate Key Error on local.slaves

The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.

exception 11000 E11000 duplicate key error index: local.slaves.$_id_  dup key: { : ObjectId('<object ID>') } 0ms

This is a benign error and does not affect replication operations on the secondary or slave.

To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:

use local
db.slaves.drop()

The next time a secondary or slave polls the primary or master, the primary or master recreates the local.slaves collection.

Elections and Network Partitions

Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.

That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only.