1000 databases and 3 collection per database

Hello All,

I would like to know if you think that having a mongodb instance with 1000 databases named Chicago, Detroit, Washington and so on, and then having 3 (same name and data) collections per database with about 1000 documents each is a good idea, in ANY situation.

If so, which and why?

Thank you so much,
Best Regards,
JP

1 Like

You might want to read:
https://www.mongodb.com/article/schema-design-anti-pattern-massive-number-collections/

1 Like

Hello Steve,

Thanks . That might be a good fit for this question…

And regarding this issue, it is even worse. Thousands of databases with 3 collections per database (the same ones) with a few records.

For the same app (so data isolation / privacy is not a concern). I know in my soul this makes no sense.

A single collection with an attribute “city_id” (or similar) , I would say would be sufficient.
Do you see something else?

Thank you again.
Best Regards,
JP

1 Like

Hi,

3000 collections isn’t a large number of collections. 1 million is though.
Using only 3 collections and in each you have a field city_id with an index is also a good solution. I’d prefer this one but 3000 collections is still manageable I think.

I wouldn’t go too much further though.

Cheers,
Maxime.

1 Like

Hello @MaBeuLux88,

Thank you for the feedback.
But remember, it is not just a matter of 3000 collections… they are 3 collections (always the same) , across 1000 databases for the same application, each collection with about 1000 documents.

The number of databases seems normal/reasonable to you?

thanks,
br,
JP

1 Like

The number of databases is not really important. It is the number of collections. Database is just a grouping of collections under a name space. Number of collections is important because a collection is at least 2 files. The number of open files is the limiting factor.

I understand. That is also clearly important.
But I am not refering to limits.
I am talking about purpose, logic, sense or pattern.

Does 1000 databases for the same application, with the same 3 collections on each, sound reasonable/normal/usual?

Thank you.

The more databases and collection, the more complexity you will have to deal with in your code, that’s for sure. But maybe it makes more sense that way.

But our answers were more around the technical feasibility.

Here is a /data/db from a brand new MongoDB node:

-rw------- 1 999 docker  20480 May  5 15:48 collection-0-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-10-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-11-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-13-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-15-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 collection-17-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 collection-19-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 collection-21-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-2-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-4-7189677319263731.wt
-rw------- 1 999 docker  36864 May  5 15:48 collection-6-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 collection-8-7189677319263731.wt
drwx------ 2 999 docker   4096 May  5 15:49 diagnostic.data
-rw------- 1 999 docker  20480 May  5 15:48 index-12-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-14-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-16-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-1-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 index-18-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 index-20-7189677319263731.wt
-rw------- 1 999 docker   4096 May  5 15:48 index-22-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-3-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-5-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-7-7189677319263731.wt
-rw------- 1 999 docker  20480 May  5 15:48 index-9-7189677319263731.wt
drwx------ 2 999 docker   4096 May  5 15:48 journal
-rw------- 1 999 docker  36864 May  5 15:48 _mdb_catalog.wt
-rw------- 1 999 docker      2 May  5 15:48 mongod.lock
-rw------- 1 999 docker  20480 May  5 15:48 sizeStorer.wt
-rw------- 1 999 docker    114 May  5 15:48 storage.bson
-rw------- 1 999 docker     47 May  5 15:48 WiredTiger
-rw------- 1 999 docker   4096 May  5 15:48 WiredTigerHS.wt
-rw------- 1 999 docker     21 May  5 15:48 WiredTiger.lock
-rw------- 1 999 docker   1338 May  5 15:48 WiredTiger.turtle
-rw------- 1 999 docker 102400 May  5 15:48 WiredTiger.wt

One collection is at least one file for the data and one file for the _id index. But you will likely have 5 indexes per collections. So 3000 collections * (1+5 files) = 18000 open files, and probably a lot more if your collections and indexes grow large enough to generate multiple files for each.
From an operating system perspective, this could start to be an issue at some point.

2 Likes

I would not do it in most case for the reasons given by @MaBeuLux88. The exemption might be a multi-tenant application. Using your city example, each tenant is a city and the data of each city is isolated and not accessible from another city. The data being isolated one city to the other the code can still be simple as your queries for a given city is limited to the collections of this city. In that case, having each city a database, helps security wise because you can give roles to the users of a given city that enforce the data isolation of each city.

2 Likes

ok.
Answered my question :wink:
thank you @steevej / @MaBeuLux88 :wink:

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.