Document Databases

When relational databases were introduced into the 1970s, data schemas were fairly simple and straightforward, and it made sense to conceive objects as sets of relationships. For example, an article object might be related to a category (an object), a tag (another object), a comment (another object), and so on.

Because relationships between different types of data were specified in the database schema, these relational databases could be queried with a standard Structured Query Language, or SQL. But the environment for data, as well as programming, has changed since the development of the SQL database:

  • The emergence of cloud computing has brought deployment and storage costs down dramatically, but only if data can be spread across multiple servers easily without disruption. In a complex SQL database, this is difficult because many queries require multiple large tables to be joined together to provide a response. Executing distributed joins is a very complex problem in relational databases.

  • The need to store unstructured data, such as social media posts and multimedia, has grown rapidly. SQL databases are extremely efficient at storing structured information, and workarounds or compromises are necessary for storing and querying unstructured data.

  • Agile development methods mean that the database schema needs to change rapidly as demands evolve. SQL databases require their structure to be specified in advance, which means any changes to the information schema require time-consuming ALTER statements to be run on a table.

In response to these changes, new ways of storing data (e.g. NoSQL databases) have emerged that allow data to be grouped together more naturally and logically, and that loosen the restrictions on database schema. One of the most popular ways of storing data is a document data model, where each record and its associated data is thought of as a “document”. In a document database, such as MongoDB, everything related to a database object is encapsulated together. Storing data in this way has the following advantages:

  • Documents are independent units which makes performance better (related data is read contiguously off disk) and makes it easier to distribute data across multiple servers while preserving its locality.

  • Application logic is easier to write. You don’t have to translate between objects in your application and SQL queries, you can just turn the object model directly into a document.

  • Unstructured data can be stored easily, since a document contains whatever keys and values the application logic requires. In addition, costly migrations are avoided since the database does not need to know its information schema in advance.

Document databases generally have very powerful query engines and indexing features that make it easy and fast to execute many different optimized queries. The strength of a document database’s query language is an important differentiator between these databases.

MongoDB is the Leading Document Database

MongoDB's document data model makes it easy to build on, since it supports unstructured data natively and doesn't require costly and time-consuming migrations when application requirements change. MongoDB's documents are encoded in a JSON-like format, called BSON, which makes storage easy, is a natural fit for modern object-oriented programming methodologies, and is also lightweight, fast and traversable.

In addition, MongoDB supports rich queries and full indexes, distinguishing it from other document databases that make complex queries difficult or require a separate server layer to enable them. Its other features include automatic sharding, replication, and more.