Shards
A shard contains a subset of sharded data for a sharded cluster. Together, the cluster's shards hold the entire data set for the cluster.
Shards must be deployed as a replica set to provide redundancy and high availability.
Important
Sharded clusters use the write concern "majority"
for a lot of internal
operations. Using an arbiter in a sharded cluster is discouraged due to
Performance Issues with PSA replica sets.
Users, clients, or applications should only directly connect to a shard to perform local administrative and maintenance operations.
Performing queries on a single shard only returns a subset of data. Connect to
the mongos
to perform cluster level operations, including read or
write operations.
Important
MongoDB does not guarantee that any two contiguous chunks reside on a single shard.
Primary Shard
Each database in a sharded cluster has a primary shard that holds all the un-sharded collections for that database. Each database has its own primary shard. The primary shard has no relation to the primary in a replica set.
The mongos
selects the primary shard when creating a new database
by picking the shard in the cluster that has the least amount of data.
mongos
uses the totalSize
field returned by the
listDatabases
command as a part of the selection criteria.
To change the primary shard for a database, use the movePrimary
command. The process of migrating the primary shard may take significant time
to complete, and you should not access the collections associated to the
database until it completes. Depending on the amount of data being migrated,
the migration may affect overall cluster operations. Consider the impact to
cluster operations and network load before attempting to change the primary
shard.
When you deploy a new sharded cluster with shards that were previously used as replica sets, all existing databases continue to reside on their original replica sets. Databases created subsequently may reside on any shard in the cluster.
Shard Status
Use the sh.status()
method in mongosh
to
see an overview of the cluster. This reports includes which shard is
primary for the database and the chunk distribution across the
shards. See sh.status()
method for more details.
Sharded Cluster Security
Use Self-Managed Internal/Membership Authentication to enforce intra-cluster
security and prevent unauthorized cluster components from accessing the
cluster. You must start each mongod
in the cluster with the
appropriate security settings in order to enforce internal authentication.
See Deploy Self-Managed Sharded Cluster with Keyfile Authentication for a tutorial on deploying a secured sharded cluster.
Shard Local Users
Each shard supports Role-Based Access Control in Self-Managed Deployments (RBAC) for restricting
unauthorized access to shard data and operations. Start each mongod
in the replica set with the --auth
option to enforce RBAC.
Alternatively, enforcing Self-Managed Internal/Membership Authentication for
intra-cluster security also enables user access controls via RBAC.
Each shard has its own shard-local users. These users cannot be used
on other shards, nor can they be used for connecting to the cluster
via a mongos
.
See Enable Access Control on Self-Managed Deployments for a tutorial on enabling adding users to an RBAC-enabled MongoDB deployment.