Thinking in Documents: Part 1



It is no longer sufficient for organizations to deliver run-of-the mill business process applications. Mobile, social, web and sensor-enabled applications are not just potential differentiators — in many cases, they are now essential for remaining relevant. Trying to force-fit data models designed decades ago to support these new types of applications inhibits agility and drives up cost and complexity. Semi-structured and unstructured data does not lend itself to be stored and processed in the rigid row and column format imposed by relational databases, and cannot be fully harnessed for real time analytics if stored as opaque BLOBS in simple Key-Value databases.

As a result, more developers are turning to document databases such as MongoDB that allow data to be represented as rich structures, providing a viable alternative to the standard, normalized, relational model. MongoDB has a number of unique features such as atomic updates, indexed array keys, and a powerful query framework that can significantly influence schema design. While many developers have internalized the rules for designing schemas for relational databases, these same rules don't apply to document databases. This blog post series will demonstrate how to take advantage of documents to build modern applications.

From 2-Dimensional Tables to Rich Document Data Models

The most fundamental difference between relational databases and MongoDB is the way in which the data is modeled. Before getting into the details, lets compare the terminology used in the relational and document model domains:

Database Database
Table Collection
Row Document
Index Index
JOIN Embedded
Document or
Table 1: Translating between relational and document data models

Schema design requires a change in perspective for data architects, developers and DBAs:

  • From the legacy relational data model that flattens data into rigid, 2-dimensional, tabular structures of rows and columns...
  • To a rich and dynamic document data model with embedded sub-documents and arrays.

MongoDB stores JSON documents in a binary representation called BSON (Binary JSON). BSON encoding extends the popular JSON representation to include additional data types such as int, long, and floating point.

With sub-documents and arrays, JSON documents also align with the structure of objects in modern programming languages. This makes it easy for developers to map the data used in the application to its associated document in the database. By contrast, trying to map the object representation of the data to the tabular representation of an RDBMS can slow down development. Adding Object Relational Mappers (ORMs) can create additional complexity by reducing the flexibility to evolve schemas and to optimize queries to meet new application requirements.

So how do I JOIN my Data?

The first concern from those coming from a relational background is the absence of JOINs in non-relational databases. As demonstrated below, the document model makes JOINs redundant in many cases.

Figure 1: Normalization and JOINs

In Figure 1, the RDBMS uses the Pers_ID field to JOIN the “Person” table with the “Car” table to enable the application to report each car’s owner. Using the document model, embedded sub-documents and arrays effectively pre-JOIN data by aggregating related fields within a single data structure. Rows and columns that were traditionally normalized and distributed across separate tables can now be stored together in a single document, eliminating the need to JOIN separate tables when the application has to retrieve complete records.

Modeling the same data in MongoDB enables us to create a schema in which we embed an array of sub-documents for each car directly within the Person document.

	first_name: “Paul”,
	surname: “Miller”,
	city: “London”,
	location: [45.123,47.232],
	cars: [
	   { model: “Bentley”,
		year: 1973,
		value: 100000, ….},
	   { model: “Rolls Royce”,
		year: 1965,
		value: 330000, ….},

Figure 2: Richly structured MongoDB documents

In this simple example, the relational model consists of only two tables (in reality most applications will need tens, hundreds or even thousands of tables.) This approach does not reflect the way architects think about data, nor the way in which developers write applications. The document model enables data to be represented in a much more natural and intuitive way.

The choice of whether to embed related data, or instead to create a reference between separate documents is something we consider in the second part of this blog series.

To further illustrate the differences between the relational and document models, consider the example of a blogging platform in Figure 3. In this example, the application relies on the RDBMS to join five separate tables in order to build the blog entry. With MongoDB, all of the blog data is aggregated within a single document, linked with a single reference to a user document that contains both blog and comment authors.

Figure 3: The top part of the diagram in the example above shows data modeled with a relational schema. The lower part of the diagram shows data modeled with a MongoDB document.

In addition to making it more natural to represent data at the database level, the document model also provides performance and scalability advantages:

  • An aggregated document can be accessed with a single call to the database, rather than having to JOIN multiple tables to respond to a query. The MongoDB document is physically stored as a single object, requiring only a single read from memory or disk. On the other hand, RDBMS JOINs require multiple reads from multiple physical locations.
  • As documents are self-contained, distributing the database across multiple nodes (a process called sharding) becomes simpler and makes it possible to achieve massive horizontal scalability on commodity hardware. The DBA no longer needs to worry about the performance penalty of executing cross-node JOINs (should they even be possible in the existing RDBMS) to collect data from different tables.

So now we’ve introduced some of the concepts, in part 2 of the thinking in documents blog series, we will start to put documents into action by discussing schema design. We will cover how to manage related data with embedding and referencing, we’ll touch on indexing and the MongoDB transaction model.

To learn more, take a look at the Thinking in Documents webinar.

To look at specific considerations in moving from relational databases, download the guide below.

Read the RDBMS Migration Guide