Navigation
This version of the documentation is archived and no longer supported.
  • Sharding >
  • Sharded Cluster Internals and Behaviors

Sharded Cluster Internals and Behaviors

This document introduces lower level sharding concepts for users who are familiar with sharding generally and want to learn more about the internals. This document provides a more detailed understanding of your cluster’s behavior. For higher level sharding concepts, see Sharded Cluster Overview. For complete documentation of sharded clusters see the Sharding section of this manual.

Shard Keys

Shard keys are the field in a collection that MongoDB uses to distribute documents within a sharded cluster. See the overview of shard keys for an introduction to these topics.

Cardinality

Cardinality in the context of MongoDB, refers to the ability of the system to partition data into chunks. For example, consider a collection of data such as an “address book” that stores address records:

  • Consider the use of a state field as a shard key:

    The state key’s value holds the US state for a given address document. This field has a low cardinality as all documents that have the same value in the state field must reside on the same shard, even if a particular state’s chunk exceeds the maximum chunk size.

    Since there are a limited number of possible values for the state field, MongoDB may distribute data unevenly among a small number of fixed chunks. This may have a number of effects:

    • If MongoDB cannot split a chunk because all of its documents have the same shard key, migrations involving these un-splittable chunks will take longer than other migrations, and it will be more difficult for your data to stay balanced.
    • If you have a fixed maximum number of chunks, you will never be able to use more than that number of shards for this collection.
  • Consider the use of a zipcode field as a shard key:

    While this field has a large number of possible values, and thus has potentially higher cardinality, it’s possible that a large number of users could have the same value for the shard key, which would make this chunk of users un-splittable.

    In these cases, cardinality depends on the data. If your address book stores records for a geographically distributed contact list (e.g. “Dry cleaning businesses in America,”) then a value like zipcode would be sufficient. However, if your address book is more geographically concentrated (e.g “ice cream stores in Boston Massachusetts,”) then you may have a much lower cardinality.

  • Consider the use of a phone-number field as a shard key:

    Phone number has a high cardinality, because users will generally have a unique value for this field, MongoDB will be able to split as many chunks as needed.

While “high cardinality,” is necessary for ensuring an even distribution of data, having a high cardinality does not guarantee sufficient query isolation or appropriate write scaling. Please continue reading for more information on these topics.

Write Scaling

Some possible shard keys will allow your application to take advantage of the increased write capacity that the cluster can provide, while others do not. Consider the following example where you shard by the values of the default _id field, which is ObjectId.

ObjectId is computed upon document creation, that is a unique identifier for the object. However, the most significant bits of data in this value represent a time stamp, which means that they increment in a regular and predictable pattern. Even though this value has high cardinality, when using this, any date, or other monotonically increasing number as the shard key, all insert operations will be storing data into a single chunk, and therefore, a single shard. As a result, the write capacity of this shard will define the effective write capacity of the cluster.

A shard key that increases monotonically will not hinder performance if you have a very low insert rate, or if most of your write operations are update() operations distributed through your entire data set. Generally, choose shard keys that have both high cardinality and will distribute write operations across the entire cluster.

Typically, a computed shard key that has some amount of “randomness,” such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However, random shard keys do not typically provide query isolation, which is another important characteristic of shard keys.

Querying

The mongos provides an interface for applications to interact with sharded clusters that hides the complexity of data partitioning. A mongos receives queries from applications, and uses metadata from the config server, to route queries to the mongod instances with the appropriate data. While the mongos succeeds in making all querying operational in sharded environments, the shard key you select can have a profound affect on query performance.

See also

The mongos and Sharding and config server sections for a more general overview of querying in sharded environments.

Query Isolation

The fastest queries in a sharded environment are those that mongos will route to a single shard, using the shard key and the cluster meta data from the config server. For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations.

If your query includes the first component of a compound shard key [1], the mongos can route the query directly to a single shard, or a small number of shards, which provides better performance. Even if you query values of the shard key reside in different chunks, the mongos will route queries directly to specific shards.

To select a shard key for a collection:

  • determine the most commonly included fields in queries for a given application
  • find which of these operations are most performance dependent.

If this field has low cardinality (i.e not sufficiently selective) you should add a second field to the shard key making a compound shard key. The data may become more splittable with a compound shard key.

See

Sharded Cluster Operations and mongos Instances for more information on query operations in the context of sharded clusters. Specifically the Automatic Operation and Query Routing with mongos sub-section outlines the procedure that mongos uses to route read operations to the shards.

[1]In many ways, you can think of the shard key a cluster-wide unique index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes unless the unique field is in the shard key. Consider the Indexing Overview page for more information on indexes and compound indexes.

Sorting

In sharded systems, the mongos performs a merge-sort of all sorted query results from the shards. See the sharded query routing and Use Indexes to Sort Query Results sections for more information.

Operations and Reliability

The most important consideration when choosing a shard key are:

  • to ensure that MongoDB will be able to distribute data evenly among shards, and
  • to scale writes across the cluster, and
  • to ensure that mongos can isolate most queries to a specific mongod.

Furthermore:

  • Each shard should be a replica set, if a specific mongod instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable.
  • If the shard key allows the mongos to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable.
  • If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.

In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.

Choosing a Shard Key

It is unlikely that any single, naturally occurring key in your collection will satisfy all requirements of a good shard key. There are three options:

  1. Compute a more ideal shard key in your application layer, and store this in all of your documents, potentially in the _id field.
  2. Use a compound shard key, that uses two or three values from all documents that provide the right mix of cardinality with scalable write operations and query isolation.
  3. Determine that the impact of using a less than ideal shard key, is insignificant in your use case given:
    • limited write volume,
    • expected data size, or
    • query patterns and demands.

From a decision making stand point, begin by finding the field that will provide the required query isolation, ensure that writes will scale across the cluster, and then add an additional field to provide additional cardinality if your primary key does not have sufficient split-ability.

Shard Key Indexes

All sharded collections must have an index that starts with the shard key. If you shard a collection that does not yet contain documents and without such an index, the shardCollection command will create an index on the shard key. If the collection already contains documents, you must create an appropriate index before using shardCollection.

Changed in version 2.2: The index on the shard key no longer needs to be identical to the shard key. This index can be an index of the shard key itself as before, or a compound index where the shard key is the prefix of the index. This index cannot be a multikey index.

If you have a collection named people, sharded using the field { zipcode: 1 }, and you want to replace this with an index on the field { zipcode: 1, username: 1 }, then:

  1. Create an index on { zipcode: 1, username: 1 }:

    db.people.ensureIndex( { zipcode: 1, username: 1 } );
    
  2. When MongoDB finishes building the index, you can safely drop existing index on { zipcode: 1 }:

    db.people.dropIndex( { zipcode: 1 } );
    

Warning

The index on the shard key cannot be a multikey index.

As above, an index on { zipcode: 1, username: 1 } can only replace an index on zipcode if there are no array values for the username field.

If you drop the last appropriate index for the shard key, recover by recreating an index on just the shard key.

Cluster Balancer

The balancer sub-process is responsible for redistributing chunks evenly among the shards and ensuring that each member of the cluster is responsible for the same volume of data. This section contains complete documentation of the balancer process and operations. For a higher level introduction see the Shard Balancing section.

Balancing Internals

A balancing round originates from an arbitrary mongos instance from one of the cluster’s mongos instances. When a balancer process is active, the responsible mongos acquires a “lock” by modifying a document in the lock collection in the Config Database Contents.

By default, the balancer process is always running. When the number of chunks in a collection is unevenly distributed among the shards, the balancer begins migrating chunks from shards with more chunks to shards with a fewer number of chunks. The balancer will continue migrating chunks, one at a time, until the data is evenly distributed among the shards.

While these automatic chunk migrations are crucial for distributing data, they carry some overhead in terms of bandwidth and workload, both of which can impact database performance. As a result, MongoDB attempts to minimize the effect of balancing by only migrating chunks when the distribution of chunks passes the migration thresholds.

The migration process ensures consistency and maximizes availability of chunks during balancing: when MongoDB begins migrating a chunk, the database begins copying the data to the new server and tracks incoming write operations. After migrating chunks, the “from” mongod sends all new writes to the “receiving” server. Finally, mongos updates the chunk record in the config database to reflect the new location of the chunk.

Note

Changed in version 2.0: Before MongoDB version 2.0, large differences in timekeeping (i.e. clock skew) between mongos instances could lead to failed distributed locks, which carries the possibility of data loss, particularly with skews larger than 5 minutes. Always use the network time protocol (NTP) by running ntpd on your servers to minimize clock skew.

Migration Thresholds

Changed in version 2.2: The following thresholds appear first in 2.2; prior to this release, balancing would only commence if the shard with the most chunks had 8 more chunks than the shard with the least number of chunks.

In order to minimize the impact of balancing on the cluster, the balancer will not begin balancing until the distribution of chunks has reached certain thresholds. These thresholds apply to the difference in number of chunks between the shard with the greatest number of chunks and the shard with the least number of chunks. The balancer has the following thresholds:

Number of Chunks Migration Threshold
Less than 20 2
20-79 4
80 and greater 8

Once a balancing round starts, the balancer will not stop until the difference between the number of chunks on any two shards is less than two or a chunk migration fails.

Note

You can restrict the balancer so that it only operates between specific start and end times. See Schedule the Balancing Window for more information.

The specification of the balancing window is relative to the local time zone of all individual mongos instances in the sharded cluster.

Chunk Size

The default chunk size in MongoDB is 64 megabytes.

When chunks grow beyond the specified chunk size a mongos instance will split the chunk in half. This will eventually lead to migrations, when chunks become unevenly distributed among the cluster. The mongos instances will initiate a round of migrations to redistribute data in the cluster.

Chunk size is arbitrary and must account for the following:

  1. Small chunks lead to a more even distribution of data at the expense of more frequent migrations, which creates expense at the query routing (mongos) layer.
  2. Large chunks lead to fewer migrations, which is more efficient both from the networking perspective and in terms internal overhead at the query routing layer. Large chunks produce these efficiencies at the expense of a potentially more uneven distribution of data.

For many deployments it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set, but this value is configurable. Be aware of the following limitations when modifying chunk size:

  • Automatic splitting only occurs when inserting documents or updating existing documents; if you lower the chunk size it may take time for all chunks to split to the new size.
  • Splits cannot be “undone:” if you increase the chunk size, existing chunks must grow through insertion or updates until they reach the new size.

Note

Chunk ranges are inclusive of the lower boundary and exclusive of the upper boundary.

Shard Size

By default, MongoDB will attempt to fill all available disk space with data on every shard as the data set grows. Monitor disk utilization in addition to other performance metrics, to ensure that the cluster always has capacity to accommodate additional data.

You can also configure a “maximum size” for any shard when you add the shard using the maxSize parameter of the addShard command. This will prevent the balancer from migrating chunks to the shard when the value of mapped exceeds the maxSize setting.

Chunk Migration

MongoDB migrates chunks in a sharded cluster to distribute data evenly among shards. Migrations may be either:

  • Manual. In these migrations you must specify the chunk that you want to migrate and the destination shard. Only migrate chunks manually after initiating sharding, to distribute data during bulk inserts, or if the cluster becomes uneven. See Migrating Chunks for more details.
  • Automatic. The balancer process handles most migrations when distribution of chunks between shards becomes uneven. See Migration Thresholds for more details.

All chunk migrations use the following procedure:

  1. The balancer process sends the moveChunk command to the source shard for the chunk. In this operation the balancer passes the name of the destination shard to the source shard.

  2. The source initiates the move with an internal moveChunk command with the destination shard.

  3. The destination shard begins requesting documents in the chunk, and begins receiving these chunks.

  4. After receiving the final document in the chunk, the destination shard initiates a synchronization process to ensure that all changes to the documents in the chunk on the source shard during the migration process exist on the destination shard.

    When fully synchronized, the destination shard connects to the config database and updates the chunk location in the cluster metadata. After completing this operation, once there are no open cursors on the chunk, the source shard starts deleting its copy of documents from the migrated chunk.

If enabled, the _secondaryThrottle setting causes the balancer to wait for replication to secondaries. For more information, see Require Replication before Chunk Migration (Secondary Throttle).

Detect Connections to mongos Instances

If your application must detect if the MongoDB instance its connected to is mongos, use the isMaster command. When a client connects to a mongos, isMaster returns a document with a msg field that holds the string isdbgrid. For example:

{
   "ismaster" : true,
   "msg" : "isdbgrid",
   "maxBsonObjectSize" : 16777216,
   "ok" : 1
}

If the application is instead connected to a mongod, the returned document does not include the isdbgrid string.

Config Database

The config database contains information about your sharding configuration and stores the information in a set of collections used by sharding.

Important

Back up the config database before performing any maintenance on the config server.

To access the config database, issue the following command from the mongo shell:

use config

In general, you should never manipulate the content of the config database directly. The config database contains the following collections:

See Config Database Contents for full documentation of these collections and their role in sharded clusters.

Sharding GridFS Stores

When sharding a GridFS store, consider the following:

  • Most deployments will not need to shard the files collection. The files collection is typically small, and only contains metadata. None of the required keys for GridFS lend themselves to an even distribution in a sharded situation. If you must shard the files collection, use the _id field possibly in combination with an application field

    Leaving files unsharded means that all the file metadata documents live on one shard. For production GridFS stores you must store the files collection on a replica set.

  • To shard the chunks collection by { files_id : 1 , n : 1 }, issue commands similar to the following:

    db.fs.chunks.ensureIndex( { files_id : 1 , n : 1 } )
    
    db.runCommand( { shardCollection : "test.fs.chunks" , key : { files_id : 1 , n : 1 } } )
    

    You may also want shard using just the file_id field, as in the following operation:

    db.runCommand( { shardCollection : "test.fs.chunks" , key : {  files_id : 1 } } )
    

    Note

    Changed in version 2.2.

    Before 2.2, you had to create an additional index on files_id to shard using only this field.

    The default files_id value is an ObjectId, as a result the values of files_id are always ascending, and applications will insert all new GridFS data to a single chunk and shard. If your write load is too high for a single server to handle, consider a different shard key or use a different value for different value for _id in the files collection.