Enable sharding - how it affects the existing databases?

Maciej_Pakulski1 · August 17, 2021, 7:03pm

Hi,

I’ve a cluster with almost 50 different databases. We’d like to enable sharding (2 shards) for the cluster, however we only want 2 or 3 collections to be shared collectons. Rest will remain non-shared.
In the documentation, there is a text:

Each database in a sharded cluster has a primary shard that holds all the un-sharded collections for that database. Each database has its own primary shard. The primary shard has no relation to the primary in a replica set.

How I understand it, is that for every single database, mongo will will choose one of the shards to be the primary and basically keep the data there. However, will mongo also try to rebalance the databases between 2 shards, so that data are spread more or less evenly ?

Prasad_Saya · August 18, 2021, 2:46am

Hello @Maciej_Pakulski1, welcome to the MongoDB Community forum!

Here are some clarifications.

How I understand it, is that for every single database, mongo will will choose one of the shards to be the primary and basically keep the data there.

For existing databases, when you deploy a new sharded cluster with shards that were previously used as replica sets, all existing databases continue to reside on their original replica sets.

This means your existing data is on a replica set. In a sharded cluster this replica set will become a shard. The existing databases on the replica set will remain on that replica set (and the shard).

For the new databases, that is the databases created subsequently, the database may reside on any shard in the cluster. The mongos selects the primary shard when creating a new database by picking the shard in the cluster that has the least amount of data.

After creating a sharded cluster, the new databases created can be on any of the two shards (in your case).

Note that you can change the primary shard for a database using the movePrimary command.

However, will mongo also try to rebalance the databases between 2 shards, so that data are spread more or less evenly ?

The distribution of collection data happens upon sharding a collection. In a sharded cluster you can have sharded and un-sharded collections. Only, the sharded collections are distributed among the shards. How evenly the data distribution happens is mainly determined by the shard key (and the number of shards).

system · August 23, 2021, 2:46am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.