Limits on data size? [2]

Hello,

Other member in this community created question:

I see that Atlas isn’t allowing me to select data size of more than 4TB per shard in MongoDB Atlas.

Is that a hard limit?

MongoDB team reply:

MongoDB offers horizontal scale-out using sharding: While a single ‘Replica Set’ (aka a shard in a sharded cluster) cannot exceed 4TB of physical storage, you can use as many shards as you want in your MongoDB Atlas sharded cluster.

For example, if you allocated 2TB per shard, a twenty shard cluster would have a total of 40TB of physical space (all would be redundant for high availability).

My Question is:

Is possible to have different data in this storage of 40TB or only 2TB (redundant in all twenty nodes)?

sharding is like a letter indexing in a dictionary. from A to Z, you will create a shard to hold data only for a portion of actual data; apples in one shard, corns in another according to one or more fields under the same name like “produce” or “farm”.

your shard settings will normally broadcast a query to all shards. but you can speed up read/write by targeted operations with the correct shard key.

But one thing to keep in mind is that the sharding is made on collections. if you shard one collection but keep the others unsharded, they will be kept in primary shard. so the answer is yes, it is possible.

but keep in mind that your “other” data can fill your primary shard faster if your sharding logic is not good enough.

check this official Sharding — MongoDB Manual about considerations on the logic.

1 Like

Welcome to the MongoDB Community @Osvaldo_Bay_Machado!

Is possible to have different data in this storage of 40TB or only 2TB (redundant in all twenty nodes)?

Data storage in Atlas clusters provides data redundancy in the sense that there are multiple copies of the data. The storage limit is separate from the data redundancy factor.

For example, if you have 2TB of storage in a 3 member replica set, there will actually be 6TB of physical storage backing the cluster. Each replica set member will be provisioned in a different cloud provider availability zone with consistent instance specs (CPU, RAM, storage). The storage limit in a dedicated Atlas cluster is based on the storageSize (size on disk) of your data files, so a 2TB replica set will store more than 2TB of data and indexes depending on how compressible your data is.

A sharded cluster is comprised of 2 or more shard replica sets which are presented (from the application of view) as a single logical cluster. A 20 x 2TB sharded cluster has a 40TB physical storage limit (not including the backend storage provisioned for data redundancy).

Regards,
Stennie