Why is having a good data modeling strategy essential from the start of your project? Any modern application needs data to run, but how you model your data will have a drastic impact on the performance of your application, as well as the speed of development. In this article, you will learn what data modeling is, why you need it, and some efficient data modeling techniques to be used with MongoDB.
Data modeling is the process of creating a clean data model of how you will store data in a database. Data models also describe how the data is related. The goal of data modeling is to identify all the data components of a system, how they are connected, and what are the best ways to represent these relationships.
Data modeling is done at the application level. Data models consist of the following components:
Entity—an independent object that is also a logical component in the system. Entities can be categorized into tangible and intangible. Tangible entities, such as books, exist in the real world, while intangible entities, such as book loans, don’t have a physical form. In document databases, each document is an entity. In tabular databases, each row is an entity.
Entity types—the categories used to group entities. For example, the book entity with the title “Alice in Wonderland” belongs to the entity type “book.”
Attributes—the characteristics of an entity. For example, the entity “book” has the attributes ISBN (String) and title (String).
Relationships—define the connections between the entities. For example, one user can borrow many books at a time. The relationship between the entities "users" and "books" is one to many.
Although MongoDB has a flexible schema, you’d need data modeling or schema design. A good data model means that you will establish a strong foundation for an ever-evolving data model that will adapt as your requirements change. By creating a solid data model right from the start, you can ensure that your application will perform better and be more future-proof.
MongoDB supports multiple ways to model relationships between entities:
One to one (1-1): In this type, one value is associated with only one document—for example, a book ISBN. Each book can have only one ISBN.
One to many (1-N): Here, one value can be associated with more than one document or value. For example, a user can borrow more than one book at a time.
Many to many (N-N): In this type of model, multiple documents can be associated with each other. For example, a book can have many authors, and one author can write many different books. The relationship between author and book is many to many.
Data can have 1:1, 1:many or many:many relationships
There are several advantages of data modeling:
Better discovery, standardization, and documentation of multiple data sources.
Allows organizations to think of long-term solutions and model data considering not only current projects, but also future requirements of the application.
A data model provides you with the foundation for the data structure of your application. Creating the data model can help you to identify the business rules that your application will need to follow. The data model will also provide your development team with a consistent map of the data used by the application they are developing.
Thinking ahead of time about how you will access the information from the application will help you plan better. It will also help you understand the business processes that are sometimes hidden or not explicitly explained by your stakeholders.
While data modeling might seem like an additional step in the development planning cycle, it can make the development cycles much faster.
As you start modeling your data, you will likely go through various steps of data analysis. Each step might produce different types of data models. Therefore, data models can be generally thought of as being one of the three following types.
These models are created using entity-relationship diagrams (ERD). An example of these three models can be found in the section titled What is an example of a data model? below.
You can think of data modeling as a series of steps, each one providing you with one of the models described above.
The first step to a data modeling process is to gather all the requirements for your application. This step will provide you with the underlying data structure that you will need to review. Analyze not only the data objects, but also the size of the data and the operations that will be performed on that data. This step will be done with the help of domain experts. At the end of this first step, you should have the necessary information to draft your conceptual data model.
The next step is to understand the relationship between the various entities that make up your whole data model. Try to think about how the objects would be related (one to one, one to many, or many to many) and the attributes you would use to describe these objects. This step will provide you with your logical data model.
Finally, you can start thinking about the actual data that you will store in the database. At this point, you will try to identify unique keys and field types. The way you model your data will depend highly on the type of DBMS you will be using. If you are using a relational database, you might start thinking about normalizing your data while you would think about embedding related information in a document database.
At the end of this step, you can produce a physical data model representing your initial database.
Patterns make data modeling more efficient and effective. With design patterns, it’s easier to accommodate changes in application requirements and structure. Some common patterns are:
Watch the data modeling introduction video to learn more about data modeling patterns.
Let’s imagine a scenario where you would build an application for the users of a library. How would you model this database?
First, you will speak with the business analysts to understand the entities that need to be part of your system. You’ll find out that these must be included:
You will also need to understand how the various entities will interact with each other. These interactions will give you the relationships in your model. In the case of the library example, interactions might look like:
These business rules will let you organize the information to build your conceptual model. By now, you understand the data necessary to build the first iteration of your software.
Organize the necessary data and show the main entities
It’s now time to create a logical data model. As part of this modeling step, you might realize that some of your data structures are more complex and require new entities.
For example, the authors would be better represented as their own entities in order to enable searching for books by authors. We’re assuming here that there is a single author per book for the sake of simplicity.
The relationships between the various objects that form the data model also start to emerge from this model.
Showing the relationship between authors, books, and users.
Choose your DBMS and build your physical data model. At this point, you will start thinking in terms of the database you picked. The type of database you choose will determine how you will store the data.
If you use a document database, such as MongoDB, you will model relationships using embedding or document references. As you establish the relationships between your various objects, you will also find your IDs and unique values representing your items.
Let’s return to our library example. In the following diagram, you can see that the author was embedded in the books. This will make it easier to create indexes to enable the full-text search capabilities of MongoDB Atlas Search. The books borrowed are listed as an array in the user document because this information will be generally retrieved all at once on the application’s main page. A different use case with this same library data might have called for a different physical data model. The ISBN and CardNum fields are unique for the documents and could be used as the ID field. You could also use them as a sharding key if you need to scale to multiple clusters.
Physical data model representation for a document database
If you decide to go with a traditional relational database, the physical data model will look very different. In this example, the authors and books table are linked through a one-to-many relationship. The authorId field is the primary key in the authors’ table, and the authorId field would be the foreign key in the books table. A joint table is added to keep track of the borrowed books along with the due dates.
Physical model representing relational database
Now that you’ve been through these stages, you have an entire database model that describes how to store your entities and create relationships amongst them. Those steps also gave you some insights into what should be your keys and your indexes.
The example above was simple enough to be represented by a simple drawing tool.
As your data models get more complex, you might need to rely on more advanced tooling to represent the data you are trying to map.
All of these tools use standard universal markup language (UML) to build your ERDs and will provide you with professional-looking diagrams that can then be shared with your team.
MongoDB provides you with a flexible schema for your data. This flexible schema means that you can easily change your data structure as your application progresses and the requirements change. This doesn’t mean that you shouldn’t think about data modeling, though. A good data model will help you understand the business requirements for your application and structure your schema to optimize your queries.
As you model your data, think about how the data will be displayed and used in your application. The way you use the data should dictate the structure of your database, and not the other way around. This is why MongoDB is easier to use for software developers. Try it out for yourself with MongoDB Atlas, MongoDB's fully managed database-as-a-service (DBaaS).
Now that you know how to model your data, you can create a MongoDB Atlas cluster and build your database. You can find out more about data modeling from the Data Modeling with MongoDB presentation from MongoDB.live 2020. There is also a course available on that specific topic at MongoDB University.
A good data analysis phase will help you plan out how you should structure your data. It will also help you identify what your indexes should be, leading to an overall better application. Once you are familiar with data modeling for MongoDB, you will also need to understand how to create good data schemas and apply patterns to make your database even more efficient. Once you master those skills, you will be able to create applications that are faster and more scalable.
There are three major steps of data modeling:
Read the section on data modeling process to learn more.
An easy-to-understand example of data modeling is a library application. In a library, there are two main entities: the books and the book users. Create a conceptual model based on the workflows—for example, borrowing books or adding a penalty if books are not returned on time. Next, create a logical model with the attributes and relationships between books and users—book details (name, author), user details (name, contact), and borrowing details (date borrowed, number of books). Lastly, create a physical model and decide how to store the data—whether you want to split the data into more collections (tables) and link them, or put all the data in a few collections (tables) and embed information that needs to be accessed together. Read the complete example here—What is an example of a data model?
Data modeling is the creation of data models—i.e., representation of various data attributes in a visual form—that shows the entire structure of data in a single view. A data model represents data in the form of various entities and the relationships between them.