Navigation
This version of the documentation is archived and no longer supported.

Shard Keys

The shard key determines the distribution of the collection’s documents among the cluster’s shards. The shard key is either an indexed field or indexed compound fields that exists in every document in the collection.

MongoDB partitions data in the collection using ranges of shard key values. Each range defines a non-overlapping range of shard key values and is associated with a chunk.

MongoDB attempts to distribute chunks evenly among the shards in the cluster. The shard key has a direct relationship to the effectiveness of chunk distribution. See Choosing a Shard Key.

Diagram of the shard key value space segmented into smaller ranges or chunks.

Important

  • Once you shard a collection, the selection of the shard key is immutable; i.e. you cannot select a different shard key for that collection.

  • Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable _id field. For details on updating the shard key, see Change a Document’s Shard Key Value.

    Before MongoDB 4.2, a document’s shard key field value is immutable.

Shard Key Specification

To shard a collection, you must specify the target collection and the shard key to the sh.shardCollection() method:

sh.shardCollection( namespace, key )
  • The namespace parameter consists of a string <database>.<collection> specifying the full namespace of the target collection.
  • The key parameter consists of a document containing a field and the index traversal direction for that field.

Important

  • Once you shard a collection, the selection of the shard key is immutable; i.e. you cannot select a different shard key for that collection.

  • Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable _id field. For details on updating the shard key, see Change a Document’s Shard Key Value.

    Before MongoDB 4.2, a document’s shard key field value is immutable.

For instructions sharding a collection using either the hashed or ranged sharding strategy, see Shard a Collection.

Change a Document’s Shard Key Value

When updating the shard key value

  • You must run on a mongos either in a transaction or as a retryable write. Do not issue the operation directly on the shard.
  • You must include an equality condition on the full shard key in the query filter. For example, if a collection messages uses { country : 1, userid : 1 } as the shard key, to update the shard key for a document, you must include country: <value>, userid: <value> in the query filter. You can include additional fields in the query as appropriate.

Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable _id field. To update, use the following operations to update a document’s shard key value:

Command Method
update with multi: false
findAndModify
 

If the shard key modification results in moving the document to another shard, you cannot specify more than one shard key modification in the bulk operation; i.e. batch size of 1.

If the shard key modification does not result in moving the document to another shard, you can specify multiple shard key modification in the bulk operation.

Shard Key Indexes

All sharded collections must have an index that supports the shard key; i.e. the index can be an index on the shard key or a compound index where the shard key is a prefix of the index.

  • If the collection is empty, sh.shardCollection() creates the index on the shard key if such an index does not already exists.
  • If the collection is not empty, you must create the index first before using sh.shardCollection().

If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.

Unique Indexes

You cannot specify a unique constraint on a hashed index.

For a ranged sharded collection, only the following indexes can be unique:

  • the index on the shard key

  • a compound index where the shard key is a prefix

  • the default _id index; however, the _id index only enforces the uniqueness constraint per shard if the _id field is not the shard key or the prefix of the shard key.

    Uniqueness and the _id Index

    If the _id field is not the shard key or the prefix of the shard key, _id index only enforces the uniqueness constraint per shard and not across shards.

    For example, consider a sharded collection (with shard key {x: 1}) that spans two shards A and B. Because the _id key is not part of the shard key, the collection could have a document with _id value 1 in shard A and another document with _id value 1 in shard B.

    If the _id field is not the shard key nor the prefix of the shard key, MongoDB expects applications to enforce the uniqueness of the _id values across the shards.

The unique index constraints mean that:

  • For a to-be-sharded collection, you cannot shard the collection if the collection has other unique indexes.
  • For an already-sharded collection, you cannot create unique indexes on other fields.

Through the use of the unique index on the shard key, MongoDB can enforce uniqueness on the shard key values. MongoDB enforces uniqueness on the entire key combination, and not individual components of the shard key. To enforce uniqueness on the shard key values, pass the unique parameter as true to the sh.shardCollection() method:

  • If the collection is empty, sh.shardCollection() creates the unique index on the shard key if such an index does not already exist.
  • If the collection is not empty, you must create the index first before using sh.shardCollection().

Although you can have a unique compound index where the shard key is a prefix, if using unique parameter, the collection must have a unique index that is on the shard key.

Choosing a Shard Key

The choice of shard key affects the creation and distribution of the chunks across the available shards. This affects the overall efficiency and performance of operations within the sharded cluster.

The shard key affects the performance and efficiency of the sharding strategy used by the sharded cluster.

The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster.

Diagram of good shard key distribution

At minimum, consider the consequences of the cardinality, frequency, and monotonicity of a potential shard key.

Restrictions

For restrictions on shard key, see Shard Key Limitations.

Collection Size

When sharding a collection that is not empty, the shard key can constrain the maximum supported collection size for the initial sharding operation only. See Sharding Existing Collection Data Size.

Important

A sharded collection can grow to any size after successful sharding.

Shard Key Cardinality

The cardinality of a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.

A unique shard key value can exist on no more than a single chunk at any given time. If a shard key has a cardinality of 4, then there can be no more than 4 chunks within the sharded cluster, each storing one unique shard key value. This constrains the number of effective shards in the cluster to 4 as well - adding additional shards would not provide any benefit.

The following image illustrates a sharded cluster using the field X as the shard key. If X has low cardinality, the distribution of inserts may look similar to the following:

Diagram of poor shard key distribution due to low cardinality

The cluster in this example would not scale horizontally, as incoming writes would only route to a subset of shards.

Choosing a shard key with high cardinality does not, on its own, guarantee even distribution of data across the sharded cluster. The frequency and monotonicity of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.

If your data model requires sharding on a key that has low cardinality, consider using a compound index using a field that has higher relative cardinality.

Shard Key Frequency

Consider a set representing the range of shard key values - the frequency of the shard key represents how often a given value occurs in the data. If the majority of documents contain only a subset of those values, then the chunks storing those documents become a bottleneck within the cluster. Furthermore, as those chunks grow, they may become indivisible chunks as they cannot be split any further. This reduces or removes the effectiveness of horizontal scaling within the cluster.

The following image illustrates a sharded cluster using the field X as the shard key. If a subset of values for X occur with high frequency, the distribution of inserts may look similar to the following:

Diagram of poor shard key distribution due to high frequency

Choosing a shard key with low frequency does not, on its own, guarantee even distribution of data across the sharded cluster. The cardinality and monotonicity of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.

If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.

Shard Key Monotonicity

A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.

This occurs because every cluster has a chunk that captures a range with an upper bound of maxKey. maxKey always compares as higher than all other values. Similarly, there is a chunk that captures a range with a lower bound of minKey. minKey always compares as lower than all other values.

If the shard key value is always increasing, all new inserts are routed to the chunk with maxKey as the upper bound. If the shard key value is always decreasing, all new inserts are routed to the chunk with minKey as the lower bound. The shard containing that chunk becomes the bottleneck for write operations.

The following image illustrates a sharded cluster using the field X as the shard key. If the values for X are monotonically increasing, the distribution of inserts may look similar to the following:

Diagram of poor shard key distribution due to monotonically increasing or decreasing shard key

If the shard key value was monotonically decreasing, then all inserts would route to Chunk A instead.

Choosing a shard key that does not change monotonically does not, on its own, guarantee even distribution of data across the sharded cluster. The cardinality and frequency of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.

If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.

←   mongos Hashed Sharding  →