Schema design: Many-to-many relationships and normalization

Satyam · January 24, 2023, 4:39am

A general rule of thumb while doing schema design in MongoDB is that you should design your database in a way that the most common queries can be satisfied by querying a single collection, even when this means that you will have some redundancy in your database. A good way to model this is by embedding. For example, in the use case you provided, if one has to regularly search for all the books by a particular author, then it would be much more useful to add an array field of books_written in the authors collection itself.

There are situations where full normalization may be beneficial in MongoDB, such as when dealing with very large datasets and a high volume of concurrent transactions. In these cases, the added complexity of multiple queries may be worth it to ensure data integrity and consistency.
Another example could be if you are dealing with sensitive data and need to ensure strict security and compliance requirements. The added complexity of multiple queries may be necessary to meet these requirements.
However, in most cases, the benefits of denormalization in MongoDB outweigh the drawbacks of increased complexity and query overhead. The flexibility and scalability of MongoDB’s document-oriented model make it well-suited for many-to-many relationships and denormalization can greatly simplify data modeling and querying.

It is not necessarily a “sin” to treat MongoDB like a relational database with full normalization, but it may not be the most efficient or effective approach for most use cases. It ultimately depends on the specific requirements and constraints of your project.

In general, favor denormalization when:

You have small subdocuments
Your data does not change very frequently
Your documents grow by a small amount over time
*You often need to query this data
You want relatively faster reads

and favor normalization when:

You have large subdocuments
Your data changes very frequently
Your documents grow by a large amount over time
Your data is often excluded from your results
You want relatively faster writes

Regarding document growth, note that MongoDB has a hard limit of 16MB per document, thus any schema design that can have a document growing indefinitely will hit this 16MB limit sooner or later.
You can further read the following documentation to further cement your knowledge of Schema Design in MongoDB.
Data Model Design
Factors to consider when data modeling in MongoDB

Note that these points are just general ideas and not a strict rule. I’m sure there are exceptions and counter examples to any of the points above, but generally it’s more about designing the schema according what will suit the use case best (i.e. how the data will be queried and/or updated), and not how the data is stored (unlike in most tabular databases where 3NF is considered “best” for most things).

Please feel free to reach out for anything else as well.

Regards,
Satyam