When people use the term “NoSQL database”, they typically use it to refer to any non-relational database. Some say the term “NoSQL” stands for “non SQL” while others say it stands for “not only SQL.” Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.
A common misconception is that NoSQL databases or non-relational databases don’t store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases, because related data doesn’t have to be split between tables.
NoSQL data models allow related data to be nested within a single data structure.
NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity.
As storage costs rapidly decreased, the amount of data applications needed to store and query increased. This data came in all shapes and sizes—structured, semistructured, and polymorphic—and defining the schema in advance became nearly impossible. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility.
Additionally, the Agile Manifesto was rising in popularity, and software engineers were rethinking the way they developed software. They were recognizing the need to rapidly adapt to changing requirements. They needed the ability to iterate quickly and make changes throughout their software stack—all the way down to the database model. NoSQL databases gave them this flexibility.
Cloud computing also rose in popularity, and developers began using public clouds to host their applications and data. They wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale-out instead of scale-up, and to intelligent geo-place their data. Some NoSQL databases like MongoDB provided these capabilities.
Now that we have an understanding of NoSQL databases, let’s contrast them with what have traditionally been the most popular databases: relational databases accessed by SQL (Structured Query Language). You can use SQL when interacting with relational databases where data is stored in tables that have fixed columns and rows.
SQL databases rose in popularity in the early 1970s. At the time, storage was extremely expensive, so software engineers normalized their databases in order to reduce data duplication.
Software engineers in the 1970s also commonly followed the waterfall software development model. Projects were planned in detail before development began. Software engineers painstakingly created complex entity-relationship (E-R) diagrams to ensure they had carefully thought through all the data they would need to store. Due to this upfront planning model, software engineers struggled to adapt if requirements changed during the development cycle. As a result, projects frequently went over budget, exceeded deadlines and failed to deliver against user needs.
Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its value, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don’t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynanoDB are popular key-value databases.
Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores.
Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases.
One way of understanding the appeal of NoSQL databases from a design perspective is to look at how the data models of a SQL and a NoSQL database might look in an oversimplified example using address data.
The SQL Case. For an SQL database, setting up a database for addresses begins with the logical construction of the format and the expectation that the records to be stored are going to remain relatively unchanged. After analyzing the expected query patterns, an SQL database might optimize storage in two tables, one for basic information and one pertaining to being a customer, with last name being the key to both tables. Each row in each table is a single customer, and each column has the following fixed attributes:
The NoSQL Case. In the section Types of NoSQL Databases above, there were four types described, and each has its own data model.
Each type of NoSQL database would be designed with a specific customer situation in mind, and there would be technical reasons for how each kind of database would be organized. The simplest type to describe is the document database, in which it would be natural to combine both the basic information and the customer information in one JSON document. In this case, each of the SQL column attributes would be fields and the details of a customer’s record would be the data values associated with each field.
Last_name: "Jones", First_name: "Mary", Middle_initial: "S",
If you’d like to try a NoSQL database, MongoDB Atlas is a great place to start. Atlas is a database service that is fully managed by MongoDB and available on all of the leading cloud providers. Atlas has a forever-free tier that you can use to kick the tires and discover the basics. When you’re ready to move beyond the free tier, you can use code NOSQLEXPLAINED for $200 of Atlas credits.
Not sure what to do now that you have an Atlas account? Head over to MongoDB University where you can get free online training from MongoDB engineers. MongoDB University has surpassed 1.4 million course registrations. The Quick Start Tutorials are another great place to start as they will help you get up and running quickly with your favorite programming language.