Clarification on database limit

Boon_Hui_Tan · September 30, 2022, 6:05am

Hi All,

I read through
Mongo Limitations and Threshold

Sorry, there is limit of 2 links for new members.

and /127411
and /5808/2

I’m being tasked to find out more about the size limitation of an unsharded database. If I understand corretly. there is theorically no database size limit, which are more often restricted by system limitation.
I also believe that the table presented only applies to sharded database, and its limitation before we can attempt to change some of the sharding parameters (Size of Shard Key Value and Chunk Size).

However, I’m curious about the line:
Quoted from

"The theoretical limits for maximum database and collection size are much higher than the practical limits imposed by filesystem, O/S, and system resources relative to workload. You can add server resources and tune for vertical scaling, but when your workload exceeds the resources of a single server or replica set you can also scale horizontally (partitioning data across multiple servers) using sharding "

What are some of the consideration you expert consider before deciding that its time to perform sharding?

Thank you.

Stennie_X · October 3, 2022, 3:38am

Welcome to the MongoDB Community @Boon_Hui_Tan !

As noted in my earlier response, the most common one is:

when your workload exceeds the resources of a single server or replica set

To add some nuance to this suggestion, the decision should really be “when you predict your workload will exceed”, as it is better to shard preemptively than to shard when your system is struggling under current workload. Initial sharding and data redistribution is better actioned when your system has some headroom for additional I/O. Shard keys are set per collection, so you when you start to shard you would generally focus on the largest and most active collections unless you have special requirements for data locality.

Aside from workload there are also operational considerations like the time it takes to backup or restore a deployment. If you have TBs of data with GBs of RAM and need to resync a replica set member (or restore a deployment from offsite backup), a single large deployment may be not meet your expectations on timing.

A sharded cluster has more components and complexity than a replica set deployment (there will now be config servers and mongos routers), and will change your backup strategy. If you have a replica set deployment with acceptable performance, I would consider reasonable vertical scaling before adding the extra infrastructure for a sharded cluster deployment. However, similar to deciding when to shard individual collections it is better to have this infrastructure in place before the need becomes desperate.

In addition to horizontal scaling for workload and data distribution, sharding can also be useful for data locality using Zone Sharding. For example, Segmenting Data by Location, Tiered Hardware for Varying SLA or SLO, or Segmenting Data by Application or Customer.

Regards,
Stennie

Boon_Hui_Tan · October 10, 2022, 4:36am

Hi Stennie,

Sorry for my late response.

Thank you for your clarification and explanation.
I will take this back to the team and discuss what would be our best course of action.

Thank you

Regards
Boon Hui

system · October 15, 2022, 4:37am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.