New
{New}  See what’s new with MongoDB 6.0 — and why you’ll want to upgrade today >>

Data Modeling Explained

Why is having a good data modeling strategy essential from the start of your project? Any modern application needs data to run, but how you model your data will have a drastic impact on the performance of your application, as well as the speed of development. In this article, you will learn what data modeling is, why you need it, and some efficient data modeling techniques to be used with MongoDB.

What is data modeling?

Data modeling is the process of creating a clean data model of how you will store data in a database. Data models also describe how the data is related. The goal of data modeling is to identify all the data components of a system, how they are connected, and what are the best ways to represent these relationships.

Data modeling is done at the application level. Data models consist of the following components:

  • Entity—an independent object that is also a logical component in the system. Entities can be categorized into tangible and intangible. Tangible entities, such as books, exist in the real world, while intangible entities, such as book loans, don’t have a physical form. In document databases, each document is an entity. In tabular databases, each row is an entity.

  • Entity types—the categories used to group entities. For example, the book entity with the title “Alice in Wonderland” belongs to the entity type “book.”

  • Attributes—the characteristics of an entity. For example, the entity “book” has the attributes ISBN (String) and title (String).

  • Relationships—define the connections between the entities. For example, one user can borrow many books at a time. The relationship between the entities "users" and "books" is one to many.

Although MongoDB has a flexible schema, you’d need data modeling or schema design. A good data model means that you will establish a strong foundation for an ever-evolving data model that will adapt as your requirements change. By creating a solid data model right from the start, you can ensure that your application will perform better and be more future-proof.

MongoDB supports multiple ways to model relationships between entities:

  • One to one (1-1): In this type, one value is associated with only one document—for example, a book ISBN. Each book can have only one ISBN.

  • One to many (1-N): Here, one value can be associated with more than one document or value. For example, a user can borrow more than one book at a time.

  • Many to many (N-N): In this type of model, multiple documents can be associated with each other. For example, a book can have many authors, and one author can write many different books. The relationship between author and book is many to many.

Types of relationship between attributes in a data model

Data can have 1:1, 1:many or many:many relationships

What are the advantages of data modeling

There are several advantages of data modeling:

  • Ensures better database planning, design, and implementation, leading to improved application performance.
  • Promotes faster application development through easier object mapping.
  • Better discovery, standardization, and documentation of multiple data sources.

  • Allows organizations to think of long-term solutions and model data considering not only current projects, but also future requirements of the application.

What are data models used for?

A data model provides you with the foundation for the data structure of your application. Creating the data model can help you to identify the business rules that your application will need to follow. The data model will also provide your development team with a consistent map of the data used by the application they are developing.

Thinking ahead of time about how you will access the information from the application will help you plan better. It will also help you understand the business processes that are sometimes hidden or not explicitly explained by your stakeholders.

While data modeling might seem like an additional step in the development planning cycle, it can make the development cycles much faster.

What are the (three) different types of data models?

As you start modeling your data, you will likely go through various steps of data analysis. Each step might produce different types of data models. Therefore, data models can be generally thought of as being one of the three following types.

  • Conceptual Data Model: The conceptual data model explains what the system should contain with regard to data and how it is related. This model is usually built with the help of the stakeholders. It represents the application’s business logic and is often used as the basis for one or more of the following models.
  • Logical Data Model: The logical data model will describe how the data will be structured. In this model, the relationship between the entities is established at a high level. You will also list the attributes for the entities represented in the model.
  • Physical Data Model: The physical data model represents how the data will be stored in a specific database management system (DBMS). With this model, you would establish your primary and secondary keys in a relational database or decide whether to embed or link your data in a document database such as MongoDB. You will also establish the data types for each of your fields. This will provide you with your database schema.

These models are created using entity-relationship diagrams (ERD). An example of these three models can be found in the section titled What is an example of a data model? below.

What is the data modeling process?

You can think of data modeling as a series of steps, each one providing you with one of the models described above.

Gather requirements

The first step to a data modeling process is to gather all the requirements for your application. This step will provide you with the underlying data structure that you will need to review. Analyze not only the data objects, but also the size of the data and the operations that will be performed on that data. This step will be done with the help of domain experts. At the end of this first step, you should have the necessary information to draft your conceptual data model.

Understand relationships between entities

The next step is to understand the relationship between the various entities that make up your whole data model. Try to think about how the objects would be related (one to one, one to many, or many to many) and the attributes you would use to describe these objects. This step will provide you with your logical data model.

Identify the data structure

Finally, you can start thinking about the actual data that you will store in the database. At this point, you will try to identify unique keys and field types. The way you model your data will depend highly on the type of DBMS you will be using. If you are using a relational database, you might start thinking about normalizing your data while you would think about embedding related information in a document database.

At the end of this step, you can produce a physical data model representing your initial database.

Apply design patterns

Patterns make data modeling more efficient and effective. With design patterns, it’s easier to accommodate changes in application requirements and structure. Some common patterns are:

  • The schema versioning pattern.
  • The bucket pattern.
  • The computed pattern.
  • The tree pattern.

Watch the data modeling introduction video to learn more about data modeling patterns.

What is an example of a data model?

Let’s imagine a scenario where you would build an application for the users of a library. How would you model this database?

First, you will speak with the business analysts to understand the entities that need to be part of your system. You’ll find out that these must be included:

  • Books: The library has millions of books, and they all have a unique ISBN. The users will need to search books by title or by author.
  • Users: This library has thousands of users, and each user has a name, along with an address. The library will assign them a unique number that they can find on their library card.

You will also need to understand how the various entities will interact with each other. These interactions will give you the relationships in your model. In the case of the library example, interactions might look like:

  • Users will borrow books: Ultimately, the library will need to know which books have been borrowed by which user. Each user is entitled to five borrowed books at a time.

These business rules will let you organize the information to build your conceptual model. By now, you understand the data necessary to build the first iteration of your software.


The conceptual model

Organize the necessary data and show the main entities


It’s now time to create a logical data model. As part of this modeling step, you might realize that some of your data structures are more complex and require new entities.

For example, the authors would be better represented as their own entities in order to enable searching for books by authors. We’re assuming here that there is a single author per book for the sake of simplicity.

The relationships between the various objects that form the data model also start to emerge from this model.


Logical Model

Showing the relationship between authors, books, and users.


Choose your DBMS and build your physical data model. At this point, you will start thinking in terms of the database you picked. The type of database you choose will determine how you will store the data.

If you use a document database, such as MongoDB, you will model relationships using embedding or document references. As you establish the relationships between your various objects, you will also find your IDs and unique values representing your items.

Let’s return to our library example. In the following diagram, you can see that the author was embedded in the books. This will make it easier to create indexes to enable the full-text search capabilities of MongoDB Atlas Search. The books borrowed are listed as an array in the user document because this information will be generally retrieved all at once on the application’s main page. A different use case with this same library data might have called for a different physical data model. The ISBN and CardNum fields are unique for the documents and could be used as the ID field. You could also use them as a sharding key if you need to scale to multiple clusters.

The physical data model for a document database

Physical data model representation for a document database


If you decide to go with a traditional relational database, the physical data model will look very different. In this example, the authors and books table are linked through a one-to-many relationship. The authorId field is the primary key in the authors’ table, and the authorId field would be the foreign key in the books table. A joint table is added to keep track of the borrowed books along with the due dates.


The physical data model for a relational database

Physical model representing relational database


Now that you’ve been through these stages, you have an entire database model that describes how to store your entities and create relationships amongst them. Those steps also gave you some insights into what should be your keys and your indexes.

What data modeling tools are available?

The example above was simple enough to be represented by a simple drawing tool.

As your data models get more complex, you might need to rely on more advanced tooling to represent the data you are trying to map.

  • Hackolade is a general purpose tool that can help you create visual representations of your data for relational or document databases. It can even be used to create your MongoDB schemas and get you up and running faster.
  • If you are looking for a free and open-source tool, you can look at Open ModelSphere, a powerful software that lets you create just about any possible model.
  • If a collaborative design is more important, consider Creately, a general design tool that lets you work in real time with other contributors on your models.

All of these tools use standard universal markup language (UML) to build your ERDs and will provide you with professional-looking diagrams that can then be shared with your team.

Flexible data modeling with MongoDB Atlas

MongoDB provides you with a flexible schema for your data. This flexible schema means that you can easily change your data structure as your application progresses and the requirements change. This doesn’t mean that you shouldn’t think about data modeling, though. A good data model will help you understand the business requirements for your application and structure your schema to optimize your queries.

As you model your data, think about how the data will be displayed and used in your application. The way you use the data should dictate the structure of your database, and not the other way around. This is why MongoDB is easier to use for software developers. Try it out for yourself with MongoDB Atlas, MongoDB's fully managed database-as-a-service (DBaaS).

Next steps

Now that you know how to model your data, you can create a MongoDB Atlas cluster and build your database. You can find out more about data modeling from the Data Modeling with MongoDB presentation from MongoDB.live 2020. There is also a course available on that specific topic at MongoDB University.

A good data analysis phase will help you plan out how you should structure your data. It will also help you identify what your indexes should be, leading to an overall better application. Once you are familiar with data modeling for MongoDB, you will also need to understand how to create good data schemas and apply patterns to make your database even more efficient. Once you master those skills, you will be able to create applications that are faster and more scalable.

FAQ

What are the steps of data modeling?

There are three major steps of data modeling:

  • Describing the workload: Based on the workload—for example, whether you have more reads or writes (the most frequent operations), you can determine the main entities.
  • Identifying and modeling relationships: Next, you have to define how the entities are related, and also how you want to reference them. For example, embedding is faster for retrieving documents that are related, whereas linking the entities (using joins) is cleaner and organized.
  • Apply design patterns: Patterns help in optimizing performance, and create easier access to data.

Read the section on data modeling process to learn more.

What is an example of data modeling?

An easy-to-understand example of data modeling is a library application. In a library, there are two main entities: the books and the book users. Create a conceptual model based on the workflows—for example, borrowing books or adding a penalty if books are not returned on time. Next, create a logical model with the attributes and relationships between books and users—book details (name, author), user details (name, contact), and borrowing details (date borrowed, number of books). Lastly, create a physical model and decide how to store the data—whether you want to split the data into more collections (tables) and link them, or put all the data in a few collections (tables) and embed information that needs to be accessed together. Read the complete example here—What is an example of a data model?

What does data modeling mean?

Data modeling is the creation of data models—i.e., representation of various data attributes in a visual form—that shows the entire structure of data in a single view. A data model represents data in the form of various entities and the relationships between them.

What is data modeling and why is it used?

Data modeling is the creation of a visual representation of the relationship between various data attributes, which forms the basic design or structure of a database. Data modeling is used to maximize the efficiency of a database, and make application development easier and faster. It also helps in faster data analysis, as the data is organized and easy to access.