The term “NoSQL” is applied to a wide variety of database technologies originally created in the Internet era when the scale of new applications and new development methods couldn’t be supported effectively or affordably by SQL databases.
Today, NoSQL databases are often chosen because of several advantages they have over SQL databases including the ability to scale out across storage systems, a more natural and easy way to model data, and improved support for Agile development processes.
In this article, we review some of the general considerations that are recommended as part of your application development planning, and then focus on the principal advantages of the four main types of NoSQL databases.
Each of these types of NoSQL databases was designed with certain kinds of data or processing in mind:
A flexible document database, which enables easy modeling of data objects inside of documents
A key-value store, whose use cases include caching
A columnar store, which, although tabular, processes data by column instead of by row
A graph database, which excels at capturing the connections between data elements
All NoSQL databases offer advantages over SQL in scaling various types of workloads.
The key to making the right choice is to understand the nature of each database and match it with your needs.
While we explain the strengths of each database type, we will also highlight the specific advantages of MongoDB.
Let’s briefly consider the four main types of NoSQL databases and their characteristics.
Document databases have a very flexible data model, the document, that can vary from record to record. They store data in JSON/BSON or XML data formats. These databases serve a wide variety of use cases and can employ complex querying using multikey, geospatial, and full-text search. Document databases are particularly effective as distributed databases deployed on a scale-out architecture.
The document model provides a flexible way to work with data of all types. At the same time, a document database like MongoDB offers scale-out functionality that supports a distributed systems design, enabling you to intelligently put data where you want it. Finally, MongoDB offers a unified experience that gives you the freedom to run anywhere, on any public cloud or in an installed, self-managed mode.
Key-value stores employ a simple schema, with a gain in read/write speed. For applications like caching, they are highly scalable.
Columnar stores capture high volumes of data and feature fast reads, but slower writes. While they are considered NoSQL databases, they are similar to relational databases in their row-and-column format.
Graph databases have a fundamentally different structure in which data elements and their relationships are stored as a graph (as in graph theory, not a line graph). Complex queries can be used to determine multiple kinds of relationships between data points and among clusters of data points.
In considering NoSQL databases, there’s likely something driving your departure from relational databases and SQL. Given the incumbent nature of relational databases and their frequent inclusion in academic training, these databases still tend to exert a strong gravitational pull.
So what is causing you to consider moving beyond traditional databases to NoSQL?
Is it the rigid schema of relational databases, which makes them difficult to change as application requirements evolve?
Is it a desire for a more iterative development process?
Is it concerns around how quickly you can load data into the database?
Is it the need to scale out huge data across data stores?
Is it the need to perform high-speed queries to support applications?
Is it the need to perform analytics faster by supporting a certain type of complex query?
Is it the need to use one repository for both transactions and analytics?
Is it the need to scale an application affordably?
There are many other questions that you may have.
It’s likely that many different factors are driving the consideration of NoSQL databases.
By analyzing the factors, it’s likely they will paint a picture of what exactly the database must do to meet your needs.
Here are some of the points that may be pivotal in your analysis.
A crucial question to consider when planning your application is whether or not you need the ACID set of properties originally associated with a relational database. ACID stands for Atomicity, Consistency, Integrity, and Durability.
This set of properties is required for high-integrity and high-durability transaction processing, such as those associated with banking transactions. While frequently associated with relational databases accessed via SQL, ACID properties are also available with NoSQL databases, including MongoDB (the most popular document database), Redis (a key-value store) and Neo4j (a graph database).
NoSQL databases broke new ground by enabling aspects of the database such as consistency to be controlled by the users. It was possible to make scalability easier by permitting inconsistent results temporarily. Eventual consistency was one idea that was not part of SQL databases.
NoSQL databases, like early versions of open-source SQL databases such as MySQL, became popular without fully supporting transactions.
But now transactions are partially or fully supported in most NoSQL databases.
MongoDB has devoted a lot of resources to supporting multi-document transactions that enable a collection of changes to multiple documents to be applied or rolled back as a group. MongoDB also supports serializable, the highest level of transaction isolation.
So, when choosing a NoSQL database, it’s important to know how important transactions will be. For certain types of use cases, like consolidating data from many sources to create a single view, transactions may not be that important. For financial applications like banking or cryptocurrency exchanges, full transactional support is vital.
NoSQL databases all have a different approach to modeling data than SQL.
If your application needs to grab one chunk of data over and over again at high speeds, the key-value stores do this really well.
If you have extremely large data sets that are tabular and you are querying by column to support analytics, wide-column databases do a great job because of the way they compress the data in each column.
If your data can be modeled by a set of interrelated objects, document databases can model each document in a separate collection and relate individual documents to one another. Documents also can contain nested structures such as arrays and sub-documents, which can capture complex data.
If the connections between data elements are numerous and contain as much information as the nodes themselves, a graph database can be a good choice.
MongoDB has extended JSON into BSON, Binary JSON, which stores a variety of data types, natively increasing efficiency. MongoDB has support for geocoding data and time series as well. In addition, MongoDB enables large collections of data to be stored in object storage, supporting data lake style deployments.
NoSQL databases all have a sweet spot for the type of queries they can perform.
Key-value stores are all about instantaneous retrieval of a chunk of data. The queries aren’t complex, but they happen fast.
Wide-column databases are incredible when aggregating values on particular columns. The compression and in-memory nature of these databases enable such queries to happen extremely fast.
Document databases enable fast retrieval of individual documents like key-value stores. MongoDB is the only document database that supports aggregation pipelines that enable a number of documents to be retrieved and then delivered in just the right form needed.
Graph databases can sift through massive layers of connected data and answer a wide variety of questions about connections between data incredibly fast with relatively simple queries. Graph algorithms can profile data and determine clusters of related objects.
MongoDB’s aggregation pipelines can now integrate queries that span multiple collections through the Union function. MongoDB has advanced capabilities for analyzing the performance of a database cluster that will recommend when indexes are needed.
NoSQL databases were all created to deliver high performance. Usually this is achieved through a combination of serving queries from data that is in-memory and using a scale-out strategy that serves a database from a cluster of smaller computers.
Each of the types of NoSQL databases use different combinations of these two strategies.
Most NoSQL databases have used the cloud to automate the expansion and contraction of clusters.
All versions of MongoDB are based on a cluster of computers, offering inherent support for high-availability. MongoDB also supports sharding of data across clusters to increase performance. MongoDB Atlas, the cloud version of MongoDB, offers a variety of cluster sizes, some of which can be scaled up and down automatically.
While the public cloud has become an exciting and convenient way to deploy a database as a service (DBaaS), most computing still takes place on-premise.
Having the flexibility to deploy on-premise or in the cloud in a self-managed way as well as via a DBaaS model on any public cloud provides the ultimate flexibility.
In addition, many users of NoSQL databases will want full commercial support.
Each NoSQL database has a different combination of such offerings.
MongoDB supports all deployment models, self-managed on premise or in the cloud and DBaaS on all public clouds. MongoDB offers commercial support for its enterprise edition and for MongoDB Atlas, its DBaaS.
Most NoSQL databases are open source and have development communities that are active in helping improve the database and find innovative ways to use it.
The size and nature of these communities vary widely as does the population of trained developers and consultants.
Most users of NoSQL databases will seek to ensure that it is easy to find talented developers to help use the database.
MongoDB has one of the largest communities of developers around the world. Recently MongoDB was voted the most wanted database by developers on Stack Overflow for the fourth year in a row.
Try out MongoDB Atlas, a fully managed cloud database that runs on all the major public clouds.