Designing Mongo DB for Multi tenanted database

We are currently in process of designing a Multi tenant application, we choose to go with Mongo db for data storage. Our application needs to store different entities, like user, employee, payroll etc., We want to build a scalable platform where clients can onboard/offboard faster, and manage the solutions easily. We also wanted to isolate the customer data logically.

We are discussing few options on Mongo db how to place the data -

  1. Separate database per client, each collection stores different group of documents from application.
  2. Separate collection per client in a database, all data for a particular client stored in a single collection each document segregated using petitioning.

Few queries -

  1. Please suggest me which is the best approach to go ahead. Is one database per client or one collection per client. Or any other feature which can logically segregate the data still achieve better performance.
  2. In one collection per client approach - When all data related to a particular client stored in one collection, does it create any performance issue during CRUD operations?
  3. In one collection per client approach - Can the data queryable using partition key and the id of the document, so it will be faster retrieval.
  4. In one collection per client approach - Does the collections are isolated behind as logical separation?

Many Thanks

1 Like

Welcome to the community, @Thirumalai_M!

Am I understanding correctly that the same application is to be used by multiple companies but with the same backend servicing all? Or will it be different deployments with a backend for each deployment? Generally you will want to store data for entities in collections so you’d want a users collection, an employees collection etc. Is the isolation of customer data so that you can give access to the data on a database level or are there other reasons? Will your clients be able to access the database directly or only via your application?

So per your questions:

  1. You’ll definitely not want one collection per client. (But you may want to store the client name in a field in each document and have just one database - depending on what your reasons are for isolating customer data)
  2. It would create data modeling issues. Normally a collection is something like ‘employees’ where you store documents like {"name": "Naomi", "team": "Engineering Communications", ...}. And then you’d have more collections for data about other things like maybe contracts or offices. If you only have one collection that would potentially be difficult.

I’m not sure what you mean with questions 3 and 4 - can you provide more information?

1 Like

Thanks Naomi for the detailed response. Your answer to my queries gives what I feed for No1. I have few queries on No1 and will clarify for No3.

Basically we are a SaaS provider company developing system, the tenants can be onboarded and store their data. There are multiple entities in our application. We are using docker/container for service layer, but database wise we have some confusion. Whether separate database per client, separate collection per client or documents to be segregated using partition key.

As per your answer for No2, I understand going with one collection per client will create data model issue. Hence going with single database with one collection per client is not feasible approach.

Using a one collection for one entity (for Ex: Employee) and having its own index/partition will not provide isolation. Our application is a heath care system, should be HIPPA compliance.

I understanding going with a single database per client will make sense for our scenario. Please correct me if I am wrong in my understanding.

If you are using MongoDB Atlas, you can control access based on field values using Realm Rules. This would allow you to use one collection per entity while restricting access to the client that the data belongs to. So you could have a rule that a user can only see data that matches the company associated to that user.

If you are not using Atlas you can control access on a database or collection level. So you could use that to restrict access for a database per client for your scenario.

@Thirumalai_M - my two cents - since it’s going to be a Healthcare/HIPPA compliant - better to go with separate database per client, that gives the highest level of isolation and security and due to security & compliance: most clients usually may want to have the database deployed on their owned secured network (cloud or on-premise) only. At the same time - it also allows doing client specific customization’s later, if required. Now, the major drawbacks of this approach is the cost and maintenance overhead etc.

Had this been a non-healthcare system - you could have even thought of having single collection for all clients but segregating their rows/documents based on additional “tenant_id” column. Although, it only provides only logical level of isolation but it’s very easier to maintain and very cost effective. Depending on the number of clients/customers - you can easily scale it by having multiple shard clusters across - may be 1 for each and/or group of customers combined into one cluster i.e. tenant_id = 1 to 10 or tenant_name = A to M and tenant_id = 11 to 20 or tenant_name = M to Z into another cluster.

1 Like

Hi @ManiK,
I really like the idea what you’ve proposed. Could you also share some insights about how the connections will be handled in the application layer (i.e. Node js as the app) to different clusters, because I’ll now have many connection strings and based on clients’ request I have to connect to different DBs contained in different clusters.