Creating a DB per User vs. Sharding per User

Philip_Mallinger · September 7, 2023, 4:43pm

Hi,

I am newer to MongoDB and am trying to make sense of sharding and how that might work within my application. For context, I am planning to have a website that sorts data based on user access rights. IE each user should only see the data that they have acquired through our company, and accidental mixing of data would not be acceptable. Based on recommendation, I have been looking into potentially creating a DB per customer setup as it provides the highest level of separation and security. I know that creating a collection per customer is not a good option, and highly not recommended. However, it seems like it is possible to setup a similar configuration by forcing sharding to separate data by customer.

What I am struggling with is understanding how this can be achieved. Additionally, part of the reason I am looking into this is that I am trying to compare cost of db per customer vs shard per customer and I am having a hard time understanding cost difference for sharding. This would include some sort of configuration for high availability and failover. How would I go about structuring sharding per customer + high availability/failover and how would this compare to a db per customer setup?

For further reference, each customer would have 1 to a few hundred nearly identical structured datasets of max only a few Mb’s each.

Thanks!

tapiocaPENGUIN · September 7, 2023, 6:36pm

This is not what sharding is used for, sharding is used to horizontally scale the data or geographically separate data. It breaks up data based on a shard key and separates it across the different shards. So each shard has a portion of the data.

Now for the user access rights there is a few things you can do. You can have a users collection and collection that has data and put a “owner” or “user” field on the data. Then query the data by “owner” that way you only get data for a specific user. This will work but with many clients hitting one DB it could cause some issues along with no logical separation. One DB having issues creates issues for all customers.

The other solution you mentioned would also work, having many DBs per client, this will make access management easier as well. You will just have to manage creating all the DBs for new clients, along with the users, permission, settings, etc.

MongoDB by default is highly available with a properly configured replica set it can have node failure and automatically elect the new primary.