When planning a new project or application, the discussion of database requirements will often come up. What type of database should be used? What is the difference between relational and non-relational databases?
This article aims to address those questions by explaining what they are and how they differ, and to help you make an informed decision.
But what is a database? A database is a collection of information, stored on a computer, that can be accessed in a variety of ways. The two main types of databases are relational and non-relational databases.
A relational database, or relational database management system (RDMS), stores information in tables. Often, these tables have shared information between them, causing a relationship to form between tables. This is where a relational database gets its name from.
A table uses columns to define the information being stored and rows for the actual data. Each table will have a column that is defined as having to have unique values. This will be the primary key. This column can then be used in other tables, if relationships are to be defined between them. When one table’s primary key is used in another table, this column in the second table is known as the foreign key.
The most common way of interacting with relational database systems is using SQL, which stands for Structured Query Language. SQL allows for specific queries to be built that help build, search, and filter data across one or multiple tables.
Imagine you run an online business. You would have a variety of information that you store like customer information, order information, and products. In a relational database, this would be stored in different tables, to group the information specific to those groups.
The “Customer” table could be structured like this:
|1426||Joanne Smith||42 Data Avenue|
Some information would be common between different tables. If the customer has placed an order before, this customerID would also be found in the “Orders” table to associate an order with the customer who placed it. This could then be used to fetch the address later when creating a postage label, for example.
Atomicity, Consistency, Isolation, Durability (ACID) is a standard that guarantees the reliability of database transactions. The general principle is if one change fails, the whole transaction will fail, and the database will remain in the state it was in before the transaction was attempted.
This is important because some transactions will have real consequences if not completed fully — for example, banking. For more information, see our documentation explaining ACID.
Using primary and foreign keys allows you to ensure there is no duplicate information. This helps enforce data accuracy because there will never be repeated information.
The process of normalization involves ensuring the data is organized in such a way that data anomalies are reduced or eliminated. This, in turn, reduces storage costs.
RDMS, or SQL databases, have been around for so long that a wide variety of tools and resources have been developed to help get started with and interact with relational databases. The English-like syntax of SQL also makes it possible for non-developers to generate reports and queries from the data.
RDMS are historically intended to be run on a single machine. This means that if the requirements of the machine are insufficient, due to data size or an increase in the frequency of access, you will have to improve the hardware in the machine, also known as vertical scaling.
This can be incredibly expensive and has a ceiling, as eventually the costs outweigh the benefits. Plus, there will potentially come a stage where you simply cannot get hardware capable of hosting the database. The only solution would be to buy a machine that supports better hardware but none of that is cheap.
In relational databases, the schema is rigid. You define the columns and data types for those columns, including any restraints such as format or length. Common examples of constraints would include phone number length or minimum/maximum length for a name column.
Although this means you can interpret the data more easily and identify the relationships between tables, it also means that making changes to the structure of the data is very complex. You have to decide at the start what the data will look like, which isn’t always possible. If you want to make changes later, you have to change all the data, which involves the database being offline temporarily.
The performance of the database is tightly linked to the complexity of the tables — the number of them, as well as the amount of data in each table. As this increases, the time taken to perform queries increases too.
A non-relational database, sometimes called NoSQL (Not Only SQL), is any kind of database that doesn’t use the tables, fields, and columns structured data concept from relational databases. Non-relational databases have also been designed with the cloud in mind, making them great at horizontal scaling. There are a few different groups of database types that store the data in different ways:
Document databases store data in documents, which are usually JSON-like structures that support a variety of data types. These types include strings; numbers like int, float, and long; dates; objects; arrays; and even nested documents. The data is stored in pairs, similar to key/value pairs.
Due to documents being JSON-like, they are much easier to read and understand as a user. The documents also map nicely to objects in code in object-oriented programming languages, making it much easier to work with.
There is also no schema, meaning you can have flexibility should documents of different shapes be inserted. However, some document database systems allow schema validation to be applied, should you want the other advantages of document databases but with a defined shape to the data.
Documents are also considered individual units, which means they can be distributed across multiple servers. Plus, the databases are self-healing which means high availability.
Document databases are also highly scalable. Unlike relational databases, where traditionally, you can only scale vertically (CPU, hard drive space, etc.), non-relational databases, including document databases, can also be scaled horizontally. This means having the databases duplicated across multiple servers, while still being kept in sync.
This is the most basic type of database, where information is stored in two parts: key and value. The key is then used to retrieve the information from the database.
The simplicity of a key-value database is also an advantage. Because everything is stored as a unique key and a value that is either the data or a location for the data, reading and writing will always be fast.
However, this simplicity also restricts the type of use cases it can be used for. More complex data requirements can’t be supported.
Graph databases are the most specialized of the non-relational database types. They use a structure of elements called nodes that store data, and edges between them contain attributes about the relationship.
Relationships are defined in the edges, which makes searches related to these relationships naturally fast. Plus, they are flexible because new nodes and edges can be added easily. They also don’t have to have a defined schema like a traditional relational database.
However, they are not very good for querying the whole database, where relationships aren’t as well — or at all — defined. They also don’t have a standard language for querying, which means moving between different graph database types comes with a learning requirement.
Wide-column databases, similar to relational databases, store data in tables, columns, and rows. However, the names and formatting of the columns don’t have to match in each row. The columns can even be stored across multiple servers. They are considered two-dimensional key-value stores because they use multi-dimensional mapping to reference data by row and column.
Like two-column key-value databases, wide-column databases have the benefit of being flexible, so queries are fast. They are good at handling “big data” and unstructured data because of this flexibility.
However, compared to relational databases, wide-column databases are much slower when handling transactions. Columns group together similar attributes rather than using rows and store these in separate files, which means transactions have to be carried out across multiple files.
If you are creating a project where the data is predictable, in terms of structure, size, and frequency of access, relational databases are still the best choice.
Normalization can help reduce the size of the data on disk by limiting duplicate data and anomalies, decreasing the risk of requiring vertical scaling in future.
Relational databases are also the best choice if relationships between entities are important. Non-relational databases can store documents within the documents, which helps keep data that will be accessed together in the same place. But if this isn’t right for your needs, a relational database is still the answer. For example, if you have a large dataset with complex structure and relationships, embedding might not create clear enough relationships.
The amount of time that RDMS have been around also means there is wide support available, from tools to integration with data from other systems.
As discussed, there are many types of non-relational databases, each having their own advantages and disadvantages.
However, non-relational databases still maintain some consistent advantages. If the data you are storing needs to be flexible in terms of shape or size, or if it needs to be open to change in future, then a non-relational database is the answer.
Modern NoSQL databases have also been designed for the cloud, making them naturally good for horizontal scaling where lots of smaller servers can be spun up to handle increased load.
See the “Use Cases” section below to learn about more specific examples of when to use a non-relational database.
In this article, you have learned about what both relational and non-relational databases are and how they differ from each other. You’ve also learned the advantages and disadvantages of both types of databases and when you might choose to use which type for your project.
Now, the choice is yours: What will you build next with data?
No. Relational, or SQL databases, store data in tables with common columns between them (known as primary and foreign keys), forming relationships between tables. The data is always structured with a defined schema that cannot easily be changed.
But there are other database types, under the non-relational/NoSQL database title, that hold unstructured, semi-structured, or structured data. These allow for flexibility and high availability.
Due to the flexibility and scalability (both horizontally and vertically) of non-relational databases, they are great for a wide range of use cases.
For example, for storing settings in an app, a key-value database would be a great fit as each setting will have only one value. For an e-commerce site, a document database would be ideal as you can store a document for each client which will have their details and order history stored together as they are private and unique to each client.
For something like fraud detection, where the relationship between data is vital but the types of data will often be unpredictable, a graph database is a great solution.
For more information on when a non-relational database might not be the best fit, there is a white paper discussing this.