BlogRun AI wherever your compliance framework demands. Read blog >

BlogRetrieval accuracy is now a competitive advantage Read blog >

A Comprehensive Guide to Data Modeling

Key takeaways

Data modeling is a process where businesses organize and store their data in the most efficient manner so that the data can be easily queried for operational and analytical purposes.
It is important to model data to maintain data integrity and security, and make data easily accessible and viewable.
There are three main types of data models: conceptual, logical, and physical.
Several modeling techniques exist to address different workloads and business requirements.
The data modeling process involves a series of steps, including gathering the requirements, understanding entities and their relationships, identifying the data structures, creating the schema, and applying design patterns.
MongoDB makes modeling data simple due to its flexible schema and support for various data modeling techniques and design patterns. Atlas UI and MongoDB Compass also provide an easy interface for its users to create and modify database schema.

Modern applications require data storage, but how that data is modeled will have a significant impact on the performance of the application, as well as the development speed. Proper data modeling management and database design are integral to setting up a successful application.

In this article, we will discuss what data modeling is, why data modeling is necessary, and important data modeling techniques that can be used in conjunction with MongoDB.

Table of contents

What is data modeling?
Why is data modeling important?
What are the 3 different types of data models?
Data modeling techniques
Data modeling process with an example data model
Next steps
FAQs

What is data modeling?

Data modeling defines how organizations store, maintain, and visualize data. It is a process which identifies how businesses should capture relevant data, identify all data elements, determine complex data relationships, and find the best way to demonstrate those relationships.

Databases can store different types of unstructured data (raw data), semi-structured and structured data—like customer data, application data, logs, images, videos, and documents—and more.

The key concepts of a data model are:

An entity: This is defined as an independent object that is also a logical component in the system. Entities can be categorized as tangible and intangible. This means that tangible entities (such as books) exist in the real world, while intangible entities (such as book loans) don't have a physical form.
Entity instances: These describe the specific instance of an entity group. For example, the tangible book entity “Alice in Wonderland” belongs to the specific instance of “book.”
Attributes: Attributes describe the characteristics of an entity. For example, the entity “book” has the attributes International Standard Book Number or ISBN (String) and title (String).
Relationships: These define the connections between the entities. For example, one user can borrow many books at a time. The relationship between the entities "users" and "books" is one to many.

Most databases, including MongoDB, support multiple ways to model relationships between entities:

One-to-one (1-1): One entity instance relates to exactly one instance of another entity. For example, a User has exactly one UserProfile.

One-to-many (1-N): One entity instance relates to multiple instances of another entity. For example, one User can borrow many books.

Data can have 1:1, 1:many or many:many relationships

To learn more about the basics, please watch our data modeling introduction video.

Why is data modeling important?

Data modeling is a critical step in database design. It serves as a blueprint for building performant, scalable databases and provides a shared reference across teams. With a proper design, even non-technical experts like business analysts can understand the database structure along with developers and data architects, thus promoting better collaboration and efficient analysis.

Data modeling ensures that data consistency and data integrity requirements are met, and business rules and data relationships remain intact.

Different industries may use various types of data models (conceptual, logical, and physical) in different ways. Data architects, system analysts, and business intelligence analysts apply these models to meet their specific business needs.

Data modelers provide organizations with organized and accessible data. In addition to data management support, modeling also helps data analytics and business intelligence teams to uncover business opportunities that were hidden due to a previously non-existent entity relationship or a better way to support business stakeholders by augmenting data-driven business processes or management systems.

What are the 3 different types of data models?

There are three types of data models, based on the level of data abstraction:

Conceptual data model

The conceptual data model explains what data the system should contain and the relationships among data elements. Built with business stakeholders, it represents the application's business logic and often incorporates domain-driven design principles. This model typically serves as the foundation for subsequent models.

Logical data model

The logical data model describes how data will be structured. In this model, the relationship between entities is established at a high level, and a list of entity attributes will also be represented. This data model can be thought of as a “blueprint” for the data that will be incorporated.

Physical data model

The physical model represents how data will be stored in a specific database management system (DBMS). In this model, primary and foreign keys in a relational database are established, or the decision to embed or link data based on entity relationships in a document database such as MongoDB is made. This is also where data types for each of your fields will be established which, in turn, will provide the database schema.

Relational databases use the Entity-Relationship (ER) Diagram to visualize logical and physical data models. Document databases like MongoDB use a similar approach. However, they also give details regarding document embedding and/or referencing along with relationships and attribute types.

The diagram above shows three levels: the conceptual model (main entities), the logical model (entity relationships and attributes), and the physical model (actual database implementation including data types and embedding/referencing decisions).

Data modeling techniques

Databases use many techniques to model and represent data. While relational databases aim to reduce data redundancy and store data in normalized form, NoSQL databases allow some duplication to ensure better read performance. Below are some techniques used by different databases:

Hierarchical data modeling

This technique follows a tree structure with a single root node, where navigation flows top-to-bottom and each parent can have multiple children. It's suitable for simple, hierarchical data but struggles with many-to-many relationships.

Graph data modeling

This technique uses nodes and edges to represent entities and relationships, where any node can connect to many others. It enables fast traversal across complex relationship paths and is ideal for highly connected data like social networks or recommendation engines.

Relational data modeling

This technique is used by relational databases that use the Structured Query Language to query data. Each entity is represented by a table, and attributes are represented by columns. The primary keys, foreign keys of each entity, are mentioned in the visual representation itself.

Entity-relationship data modeling

This technique represents the database system at a very high level. It is used to build the physical model of a database, as it is very easy to visualize and gives details of each entity, attributes, and relationships in a single view that can be understood even by non-technical stakeholders. There are several ER modeling tools available like Hackolade and Open ModelSphere.

Object-oriented data modeling

The object-oriented data modeling technique uses an object-oriented programming paradigm, where entities are represented as objects with a set of associated attributes. The model presents the view as a collection of objects.

Dimensional data modeling

Dimensional models promote data redundancy for faster data retrieval and better read performance. These models are used in a data warehouse for analytics purposes.

MongoDB stores data as JSON-like (BSON) documents. The data that is used together is always stored together as nested fields, arrays, or subdocuments. Documents are held inside a collection, which is analogous to a database table in a relational database. MongoDB supports the above techniques through its design patterns, like the extended reference pattern, bucket pattern, polymorphic pattern, tree pattern, computed pattern, and so on. You can read all about the MongoDB design patterns on our blog series Building With Patterns.

Data modeling process with an example data model

The data modeling process is a series of steps taken to create a data model, and may vary for any database management system. As an example, we will show the modeling process for MongoDB for a basic library system with five entities:

Book: Collection of books, with details like ISBN and name, with ISBN being the unique identifier
Author: Collection of authors, with details about the author, where author_id will be the unique identifier
User Profile: Profiles of all the users that enrolled for a library subscription, i.e., borrowing books
User: All the active users who borrow books from the library with minimal details that are accessed regularly, like the books they borrowed, due date, and so on
Review: Collection of user reviews for various books

We follow through the data model types to arrive at the database schema for this requirement.

Gathering requirements

The first step is gathering application requirements. This uncovers the underlying data structures and, critically, the workload: what operations will be performed, how much data is involved, and how frequently operations occur.

Understand relationships among entities

The next step is to understand the relationships between data entities that make up the logical model. Try to think about how the objects are related (e.g., one to one, one to many, or many to many) and what data attributes will be used to describe these objects. In our library example, a book can be written by multiple authors (many:many). A user can be associated only with one user profile (1:1). Also, a user can write multiple reviews. However, one review can be associated with one user only (1:many).

The next step is to decide whether to reference or embed data in a document by using a set of questions based on the workload.

Questions and points for embedding:

Simplicity: Would keeping the pieces of information together lead to a simpler data model and code?
Yes – point for embedding
Go together: Do the pieces of information have a “has-a”, “contains”, or similar relationships?
Yes – point for embedding
Query atomicity: Does the application query the pieces of information together?
Yes – point for embedding
Update complexity: Are the pieces of information updated together?
Yes – point to embedding
Archival: Should the pieces of information be archived at the same time?
Yes – point for embedding

Questions and points for referencing:

Cardinality: Is there a high cardinality (current or growing) in a “many” side of the relationship?

Yes—point for reference

Data duplication: Would data duplication be too complicated to manage and undesired?

Yes—point for reference

Document size: Would the combined sizes of the pieces of information take too much memory or transfer bandwidth for the application?

Yes—point for reference

Document growth: Would the embedded piece grow without bound?

Yes—point for reference

Workload: Are the pieces of information written at different times in a write-heavy workload?

Yes—point for reference

Individuality: For the children's side of the relationship, can the pieces exist by themselves without a parent?

Yes—point for reference

Based on the above, in our library system, we can embed a subset of author documents (like their names and genres) inside the book collection. On the other hand, since a user will have many details that may not all be useful at all times, we can reference the details when required from the User Profile collection.

The completion of this step results in the logical data model.

Identify the data structure

This step considers the actual data stored in the database, and data modeling techniques will also depend on the type of DBMS used. For example, if using a relational database, identifying unique keys and field types, as well as normalizing data, will be necessary. However, with a document database, embedding related information may be warranted. For references, we use ObjectId (or ObjectId[] for arrays). For embedding, we use nested documents or embedded arrays—for example, embedding an array of author subdocuments within each book.

This step produces a physical data model representing the initial database design known as schema.

Apply design patterns

Patterns make data modeling more efficient and effective. With design patterns, it's easier to accommodate changes in application requirements and structure. In MongoDB, there are a number of patterns to choose from, including schema versioning, bucket, computed, and tree, to name just a few. For a more complete list, including details on each pattern type, check out MongoDB's Building With Patterns: A Summary

MongoDB provides a flexible schema for data, meaning that you can easily change data structures as your application progresses and requirements change without compromising data integrity. This flexibility enables you to restructure your schema and optimize your queries as many times as necessary.

In addition, as you model your data, think about how the data will be displayed and used in your application. The way you use the data will likely dictate the structure of your database, and not the other way around. This may be why MongoDB is so intuitive for software developers.

Using the MongoDB Atlas UI, you can design and create a visual representation of your schema.

Next steps

Data modeling is an essential step for a scalable, robust, and performant database. It enables data engineers, business users, and database administrators to view and query data easily and quickly. Selecting the right database and modeling technique is crucial, and you should decide that based on your use case.

Just as selecting the right pattern/modeling technique is an important step in data modeling, it's also critical to avoid schema design anti-patterns. To learn more, read MongoDB's Schema Design Anti-Patterns.

Interested in learning even more? Here are three ways to level up your data modeling skill set.

Review the MongoDB Data Modeling Intro course to catch up on the latest trends in data modeling.
Try out MongoDB Atlas, our fully managed developer data platform, and see how this data modeling tool can take your projects to the next level.
Take a course on data modeling on MongoDB University and build your knowledge even further.
Get a free MongoDB Skill Badge credential on a data modeling topic, each in 60 to 90 minutes.

FAQs

Data modeling defines how organizations store, maintain, and visualize data coming from various sources, to be used for different workloads and use cases.

The four types of data modeling are conceptual, logical, physical, and dimensional. They progress from a high-level business view (conceptual) to a detailed structure (logical), then to database-specific implementation (physical), and to analytics optimization (dimensional) as required.

MongoDB simplifies data modeling through its flexible, document-oriented structure, allowing data to mirror real-world objects. MongoDB’s dynamic schemas, embedded documents, design patterns, and references model complex relationships efficiently. MongoDB further enables fast, adaptable, and application-aligned data design through indexing and aggregation.

A simple example of a data model could be a library system with entities like Author, Book, User, User Profile, and Review.

Each Book has attributes such as ISBN, Title, and Genre, and links to an Author. A User has a User Profile with details like Name and JoiningDate. Reviews connect Users and Books, containing Rating and Comments.

Together, these relationships define how data is structured and connected within the library system.

No, but MongoDB supports data modeling through its flexible document structure, allowing developers to design schemas that fit their application’s needs. MongoDB uses tools like MongoDB Compass, Hackolade, and Studio3T for data modeling design and visualization.

Get started with Atlas today

Get started in seconds. Our free clusters come with 512 MB of storage so you can play around with sample data and get oriented with our platform.

Try FreeContact sales

GET STARTED WITH:

125+ regions worldwide
Sample data sets
Always-on authentication
End-to-end encryption

Command line tools