I’m planning to deploy a sharded cluster and I’m looking for the best practices for sizing. Let’s say I want my workloads to be available to many clients and I think about creating a database for each single client + test database for Dev purposes. I don’t know yet how many DBs will I have, maybe more than 1 K. So my questions are:
- What’s the maximal number of DBs you have ever seen and is there a number of DBs that must not be exceeded if I want to keep a reliable performance?
- How many collections did you see in the “biggest” sharded cluster? I know that 10 K per replica set is supported officially but my cluster can consist of many replica sets, did you see more than 10 K?
Many thanks in advance!
Hi @Petr_Makarov and welcome to MongoDB community forums!!
The documentation on Operational Restrictions in Sharded Clusters would be a good starting point to learn about the sizing of a sharded cluster.
Since the sharding is performed on a single collection and it is broken down into multiple chunks based on the shard key selected. In the above statement, by DBs, are you referring to the number of chunks being formed? If yes, the chunks are approximately equal sized bytes divided between various shards based on the selected shard key. You can read more about shards, chunks and shard keys in the attached documentations.
Also, regarding the shards size, the official documentation says:
Sharded collections can grow to any size after successfully enabling sharding.
When a collection is sharded between the shards, the documents are distributed between the shards based on the shard key. Does the collection implies to documents in the collection?
If MongoDB cannot split a chunk that exceeds the specified range size, MongoDB labels the chunk as jumbo.. You can read more about Data partitioning in the official documentation.
Also, I would recommend you taking the Basic Cluster Administration Course which explains about the basics of the sharding concepts.
Generally, we do not have a hardcoded limit on the number of collections we should have on the replica set. However, the performance would always depend on various factors of the applications.
- the hardware specifications of the deployment.
- the workload of the application.
Let us know if you have further questions.
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.