Five Minute MongoDB: Why Documents?

Facebook ShareLinkedin ShareReddit ShareTwitter Share

The document is the natural representation of data. We only broke data up into rows and columns back in the 70s as a way to optimize data access. Back then, storage and compute power was expensive and so it made sense to use developer time to reduce the data set into a schema of rows and column, interlinked by relationships and then normalized between tables to reduce duplication. This process was cost-effective then and so it came to dominate database thinking.

That domination means that many people accept the burden of defining rows and columns as an essential part of using databases. In many ways though, relational databases are still expecting the designer and developer to pre-chew the data for easier processing by the database.

The Document Alternative

Technology has moved on though and systems are more than capable of managing documents. Documents are, in the form of JSON formatted data, the lingua franca of the internet and databases like MongoDB have emerged with the ability to naturally understand these documents so they can be efficiently stored, queried and manipulated as documents.

Consider a database of "things". A relational database may slice up the natural representation of a "thing" into a table of things, with maybe essential properties like name, description, and price in it and other attributes broken out into tables of color, sizes, manufacturers, and texture. Writing a thing into the database will involve writing to that main table of things and looking up other properties in other tables and adding references to them, or if not present, adding to those tables of properties. Reading a thing's essential data means finding it on the table of things. Finding all the relevant data may involve multiple lookups using an identifier or joins between tables to pull the required information together. And all the tables will need an index on them to speed up access. This is the relational way.

On the other hand, a document database like MongoDB looks to hold all that information together in a single document. When resolving queries, it can still get at information within the stored documents, and that same information can be used to create indexes for faster access. It doesn't require that the data be referred to in another table; with a document database that's exactly what you don't want to do. The essential thing is that retrieving one document should tell you everything about that "thing", a complete collection of data about the "thing" in one place.

From Documents to Objects

Modern software development usually treats abstract things as objects. An object could model a car and have attributes such as body style, color, top speed, engine size, and previous owners. Like documents, they are fundamentally a collection of data which is generally not well suited to be split up into multiple different data structures. The one difference is that objects may also have code associated with them but that's for the application to manage.

For relational databases, often these objects are created and managed using the subtle smoke and mirrors of ORMs; Object Relational Mappers. They turn the rows and columns, which we carefully disassembled the thing into for the relational database, back into objects, reconstituting their contents. Without an ORM, the application developer writes queries which return data which they can populate an object with. ORMs save the developer a lot of time, but often at the cost of performance versus handcrafted relational queries.

For document databases, there's no breaking up of the data, so there's no need to reconstitute; just retrieve the document and treat it like any other object in your application. Manipulate the fields, rearrange arrays, created nested objects within the data… and when you are done, write the object back, as a document, to the database. With no ORM in the way, your path to performance is a lot clearer too. And if you are iterating your application design, that document object synchronicity means you can speed up the process by just changing what's in your document structure. No need to add extra relational-scaffolding.

Documents are the natural home of data and map well to modern dynamic objects and the languages that work with objects. If you want to get ahead on how you build your documents, I recommend dipping into our ongoing series of data design patterns for MongoDB, Building with Patterns.

Once you get why documents, you won't want to go back. Get yourself a free MongoDB cluster on MongoDB Atlas and start exploring today.