Why is having a good data modeling strategy essential from the start of your project? Any modern application needs data to run, but how you model your data will have a drastic impact on the performance of your application, as well as the speed of development. In this article, you will learn what data modeling is, why you need it, and some efficient data modeling techniques to be used with MongoDB.
Table of contents
Data modeling is the process that creates a visual representation of the data components in a system. This visual representation not only helps identify all data components, but also helps determine the relationships among data elements while finding the best way to demonstrate those relationships.
Data models consist of the following components:
Entity is defined as an independent object that is also a logical component in the system, entities can be categorized as tangible and intangible. This means that tangible entities (such as books) exist in the real world, while intangible entities (such as book loans) don’t have a physical form.
Entity types describe the categories used to group entities together. For example, the tangible book entity “Alice in Wonderland” belongs to the entity type “book.”
Attributes describe the characteristics of an entity. For example, the entity “book” has the attributes International Standard Book Number or ISBN (String) and title (String).
Relationships define the connections between the entities. For example, one user can borrow many books at a time. The relationship between the entities "users" and "books" is one to many.
MongoDB supports multiple ways to model relationships between entities:
One to one (1-1): In this relationship, one value is associated with only one document. For example, a book can have only one ISBN.
One to many (1-N): Here, one value can be associated with more than one document or value. For example, a user can borrow more than one book at a time.
Many to many (N-N): In this type of model, multiple documents can be associated with each other. For example, a book can have many authors, and one author can write many different books. The relationship between author and book is many to many.
Data can have 1:1, 1:many or many:many relationships
To learn more, watch this data modeling introduction video.
While data modeling can initially just seem like an additional step in the development planning cycle, it is actually quite important. The process of creating data models not only helps make development cycles much faster, it also aids in anticipating future business requirements which can save time and money.
Specifically, using data models:
Ensures better database planning, design, and implementation, leading to improved application performance.
Promotes faster application development through easier object mapping.
Supports better discovery, standardization, and documentation of multiple data sources.
Allows organizations to think of long-term solutions and model data considering not only current projects, but also future requirements of the application.
While many think of a data modeler as a type of data scientist, tucked away and isolated from business functions, the truth is that the impact of data modeling is felt throughout the business. The types of data modeling used, the application of conceptual, logical, and physical data models, and how data architects, information system analysts, and business intelligence analysts apply them to fulfill business needs may differ by industry, but their ability to support business processes and critical business intelligence functions remains consistent.
One example of how data modelers support businesses is through information systems integration. Many organizations find themselves with multiple databases lacking the necessary documentation to determine if business data requirements can be met and whether there are data redundancies to be eliminated across systems for data management purposes. Other organizations may be implementing new software systems and trying to inventory available relational data in their data warehouse to populate these systems, but lack an entity relationship diagram.
Through the use of conceptual data modeling, relational data elements can be understood and redundancies discovered. And, through the use of logical and physical data modeling, organizations can ensure data is structured optimally to align with existing database designs, work with newly installed software systems, or support data management and data architecture efforts relating to an existing data warehouse.
It goes without saying that data modelers provide business analysts with better organized and more accessible data. However, in addition to data management support, modeling also helps uncover business opportunities that were hidden due to a previously non-existent entity relationship or a better way to support business stakeholders by augmenting data-driven business processes or management systems.
Data models are usually categorized as one of the following three types:
Conceptual data model: The conceptual data model explains what data the system should contain as well as the relationships among data elements. This model is usually built with the help of the business stakeholders, represents the application’s business logic, and is often used as the basis for one or more of the following models.
Logical Data Model: The logical data model describes how data will be structured. In this model, the relationship between entities is established at a high level and a list of entity attributes will also be represented.
Physical Data Model: The physical data model represents how data will be stored in a specific database management system (DBMS). In this model, primary and secondary keys in a relational database are established, or the decision to embed or link data in a document database such as MongoDB is made. This is also where data types for each of your fields will be established which, in turn, will provide the database schema.
Both logical data models and physical data models are created using a structural diagram called an entity relationship diagram (ERD). To learn more, an example of these data models can be found in the section titled What is an example of a data model? below.
The data modeling process is a series of steps taken to create one of the data models described above. These steps include:
The first step in the data modeling process is to gather all requirements for the application. This step helps uncover the underlying data structure that needs to be reviewed and it's important to not only analyze the data objects, but also the amount of data and the operations that will be performed on that data. Often, domain experts are involved in helping provide requirements and are a valuable resource in making sure all information needed to draft your conceptual data model is present. The final result of this step is the complete conceptual model.
The next step is to understand the relationships between data entities that make up the data model. Try to think about how the objects are related (e.g., one to one, one to many, or many to many) and what data attributes will be used to describe these objects. The completion of this step results in the logical data model.
This step considers the actual data stored in the database and data modeling techniques will also depend on the type of DBMS used. For example, if using a relational database, identifying unique keys and field types, as well as normalizing data will be necessary. However, with a document database, embedding related information may be warranted.
Regardless of the structure chosen, this step will produce a physical data model representing the initial database design.
Patterns make data modeling more efficient and effective. With design patterns, it’s easier to accommodate changes in application requirements and structure. There are a number of patterns to choose from including schema versioning, bucket, computed, and tree to name just a few. For a more complete list, including details on each pattern type, check out MongoDB's Building with Patterns: A Summary.
Imagine a scenario where you are building an application for the users of a library. How will you model this database?
First, you will speak with the business analysts to understand the entities that need to be part of your system. You’ll likely find out that the following entities must be included:
Understanding how these data entities (e.g., book ISBNs, user names) will interact with each other is also key. These interactions will comprise the relationships in your model.
Interaction example:
These business rules enable organization of the information needed to build the conceptual model as you now understand the data necessary to build the first software iteration.
Organize the necessary data and show the main entities
It’s now time to create a logical data model. As part of this modeling step, you might realize that some data structures are more complex and require new entities. For example, the author names may be better represented as their own entities in order to enable searching for books by author.
(Note: To simplify the example, assume that there is a single author per book.)
After considering the addition of new or modified entities, relationships between various data model objects will also begin to emerge. Consider the logical data model illustration below.
Showing the relationship between authors, books, and users.
Now you're ready to choose your DBMS and build your physical data model. At this point, thinking in terms of the type of database chosen will determine how the data is stored. For example, if using a document database, such as MongoDB, you'll model relationships using embedding or document references. As you establish the relationships between various objects, you'll also find the IDs and unique values representing your items.
Returning to our library example in the following diagram, you can see that "author" was embedded in the "books." This will make it easier to create indexes, enabling the full-text search capabilities of MongoDB Atlas Search. The books borrowed are listed as an array in the user document because this information will generally be retrieved all at once on the application’s main page.
However, a different use case with the same library data might have called for a different physical data model. The ISBN and CardNum fields are unique for the documents and could be used as the ID field. They could also be used as a sharding key if there is a need to scale to multiple clusters.
Physical data model representation for a document database
If you decide to go with a traditional relational database, the physical data model will look very different. In this example, the authors and books tables are linked through a one-to-many relationship. The authorId field is the primary key in the authors table, and the authorId field is the foreign key in the books table. A joint table is added to keep track of the borrowed books along with the due dates.
Physical model representing relational database
Remember, just as selecting the right pattern is an important step in data modeling, it's also critical to avoid schema design anti-patterns. To learn more, read MongoDB's A Summary of Schema Design Anti-Patterns and How to Spot Them.
Through the progression of the data modeling steps in this example, an entire database model that describes how to store entities and address relationships among them has been created. These steps have also provided insights into keys and indexes as well.
While our library example was simple enough to be represented through a simple drawing tool, other data models become more complex. For these intricate data models, you may require a more advanced data mapping tool. Here are some options to consider.
Hackolade is a general purpose tool that helps create visual representations of data for document or relational databases. It can even be used to create MongoDB schemas and get you up and running faster.
Open ModelSphere — a free open-source tool — helps you create just most data models.
Creately — a general design tool supporting collaboration — helps you work on your data models collaboratively, in real time. All of these tools use standard universal markup language (UML) to build ERDs and will provide professional-looking diagrams that can then be shared with your team.
MongoDB provides a flexible schema for data, meaning that you can easily change data structures as your application progresses and requirements change. This flexibility enables you to restructure your schema and optimize your queries as many times as necessary.
In addition, as you model your data, think about how the data will be displayed and used in your application. The way you use the data will likely dictate the structure of your database, and not the other way around. This may be why MongoDB is so intuitive for software developers.
Interested in learning even more? Here are three ways to level-up your data modeling skill set.
Review the Data Modeling with MongoDB presentation to catch up on the latest trends in data modeling.
Try out MongoDB Atlas — MongoDB's fully managed database-as-a-service (DBaaS) and see how this data modeling tool can take your projects to the next level.
Take a course on data modeling at MongoDB University and build your knowledge even further.