Creating a new database vs a new collection vs a new cluster

Hello @Shruthi_s1, welcome to the MongoDB Community forum!

Databases and Collections:

In general, an application has some data associated with it. A typical web application has a database where the application’s data is stored. For example, a financial accounting application has various modules like accounts receivables, payables and general ledger - this is a categorization of an application at a very high level. As such each of these modules can be an application by itself. The data is also categorized by its functionality.

If you are building such an application, it is likely the application’s data is stored in three databases - one for each module. And, within the accounts receivables module there are various functions like, customer management, invoice management, etc. Each of these data is different and is stored in different collections within the accounts receivables database.

Another example is a blogging application. There are users, blog posts and reviews. The data is stored in different collections - users, blogs and reviews (maybe users and blogs plus reviews). It will be impractical to store user and blog information in a same collection. Because, user data is different, it has different fields and structure - user name, password, email, etc. A post’s data is a title, content, the user who wrote it, reviews, etc. These cannot be put together in same collection. The data is inserted, updated, and queried from the collection. To get user data you go to user collection.

So, you can think about collections are a grouping of similar data. And a database is a grouping of similar collections, i.e., data serving a larger functionality or a module. You would not like storing customer and invoice information in a same collection - it is impractical to store and use. Analogically, it is like putting salt and pepper in different containers - different containers for different ingredients serving different purposes.

MongoDB Clusters:

MongoDB has standalone, replica-set and sharded clusters. These configurations serve different purposes.

A standalone is a single server where all the databases (and their collections) are stored. In case the server goes down, your application and its users will wait until the server is again up and running.

A replica-set has the feature that the data is replicated on multiple databases servers. So, the advantage is if one of the servers die, other servers with their replicated data will go on serving the application and its users.

A sharded cluster has multiple shards - each shard is a replica-set - and the application’s data is distributed among these shards. For example, the customer data is stored on multiple shards. If there are five shards, and there are one hundred customers, you can think that each shard stores about twenty customer data (actual distribution is done based on criteria like shard key).

MongoDB has sharding at collection level, and a sharded cluster can have sharded and un-sharded data.

Finally:

How do you determine what cluster, database or collection? It is a broad subject. In fact it’s a combinations of various subjects like data modeling (or database design), then there is application design, etc. And, these are also based upon the requirements of an application.

Some useful references:

9 Likes