MongoDB multi-tenant design faces unexpected slow insertion time


We have been working on a multi-tenant model for our business and using MongoDB for handling the data for our clients, our clients are basically some organizations that have their our users using the system.

Now we have a single application layer (single application deployment) for controlling or connecting to multiple databases within a single cluster. The change of database connection happens in the application (Built with Node.js and Mongoose) layer. Each database contains the same collections, for example db-00 and db-01 will have the same number of collections such as users, business_units, bookings, doucment_management, inventories… etc.

For each request entering the server the following code executes to connect the user request to the appropriate dataase:

const MONGO_URI = < only the cluster url with user and password no /db attached here >
let databaseCluster = {};
const connect = (db) => {
  return mongoose.createConnection(
    process.env.MONGO_URI + `/${db}?retryWrites=true&w=majority`,
      useNewUrlParser: true,
      useCreateIndex: true,
      useFindAndModify: false,
      useUnifiedTopology: true,
      autoIndex: true,

// this function si called once the server starts in the server js file
exports.connectToCluster = () =>
  new Promise(async (resolve, reject) => {
    try {
      let adminConnection = await connect(process.env.ADMIN_DB);
      let tenants = await Tetants(adminConnection).find({});
      databaseCluster = adminConnection;
      console.log("MongoDB admin Connected");
      tenantsMap = tenants;
      resolve("Tenants cached successfully.");
      console.log("Tenants cached successfully");
    } catch (err) {
      // Exit process with failure

exports.getConnectionByTenant = (tenantName) => {
  let connection = null;
  connection = databaseCluster.useDb(tenantName, { useCache: true });
  return connection;

Whenever a model is compiled an example code looks like this:

const connection = getConnectionByTenant("my_tenant");
const ModelBuilder = (connection) => connection.model("collection_name",modelSchema);
const Model = ModelBuilder(connection);

Finally in the data access layer it is used as normal code as following:

// insert operation
const DataObject = new Model({

// find operation
const data = await Model.findOne({});

The system works fine locally and Atlas cluster as well. It connects to different databases as expected. But we have come across a wired problem.

Problem statement

When inserting data only a single database becomes really slow, for example consider we have db_1, db_2, db_3 insert operation becomes really slow in db_3 even though there is almost no data and takes about 40 seconds to insert an item. And when we restart the server the problem shifts to a different database like db_2 and db_3 becomes normal. We have confirmed that this problem occurs on one database only doesn’t matter how may db we have. Except for the particular db all other dbs operate perfectly fine.

Could anyone help on this issue.



You do not have enough resources for your use-cases.

Your system is overwhelmed by disk I/O.

Thanks Steevej,

I’m just wondering why in only one db it happens but other databases operate absolutely normal! Could you show me some direction so that I can dig more. Also would love to here about any alternative suggestions

You need to investigate memory usage, disk usage and CPU usage.

Only an assumption. Because the cache is filled with the first databases you access. Something has to be eventually written to disk and since the cache is full, write to disk must be done in sync with new writes to the DB.

That is why you have to investigate memory usage, disk usage and CPU usage before you assume that the server is not behaving correctly.

You have tagged atlas-cluster, which tier do you run? How many databases, collections, documents and average size of documents.

I’m using a M10 cluster, and i have about 16 databases each having about 55 collections in it. Currently the maximum documents in a collection is about 1000-1200, average size is shown as N/A. In terms CPU and memory usage it has used a really small portion, I am attaching the images here for convenience:

Other screenshots are attached in this zip

[ - Google Drive]

I have learnt that Atlas tier has a collection limit of 1500 in M10 so this multi database model won’t be suitable for scaling :pensive:!?

Few things here @iMS_Systems:

Can you provide a link to this limit regarding collections and M10? I can see the following limits relating to 1500 and M10:

In saying the above, you note “collection limit” of 1500 for your M10 - However, I am only see a “connection” limit for M10 matching this value. Was this a typo and can you advise how you’re seeing yourself hit this limit?

How is the 40 seconds being measured at the moment? Can you provide any information about how it’s being measured as well as the approximate 40 second measure time you’re seeing.

You might also wish to check out the Monitor your Database Deployments page for some useful pieces of information regarding metrics.

1 Like