The Manchester Guardian was founded in 1821. Nearly two centuries later, The Guardian has become a touchstone of British news and an industry leader in adopting new technology to transform the business. The newspaper’s online site – guardian.co.uk – was the first UK newspaper web site to break 20 million unique users a month, and it has been awarded four Webby Awards in five years for best newspaper web site.
As The Guardian evolves from a traditional publishing model, they’ve determined that user engagement is directly correlated with revenue: the more media content users interact with, the higher the revenue. With this strong incentive to expand their relationship with readers, The Guardian needed a flexible, extensible identity system to track and store user data.
Constrained by their rigid relational database architecture, The Guardian developed their new system on MongoDB, which now helps the UK publisher deliver interactive features to users more quickly, and thereby drive additional revenue.
The Guardian’s relational database architecture hindered their evolution from a traditional publishing model to rich, dynamic content. This transition required an identity system that could be extended to handle data unique to new features being rolled out onto the site. For example, if a user submits an entry to a contest, that would need to be added to their identity record. Their existing identity system used a rigid schema and could not be extended to support the variety of new features required by this business change.
“Relational databases have a sound approach, but that doesn’t necessarily match the way we see our data,” said Philip Wills, software architect at guardian.co.uk. “MongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody’s theoretical view.”
Enabled by MongoDB, The Guardian’s identity system is the arbiter of user credentials for logging on and interacting with the site. MongoDB provides the flexibility to account for variation in how much detail is stored for users depending on what services they interact with – e.g. simply commenting on the site versus entering a third-party competition –as well as the ability to store only what is relevant. Adding social sign-on from a Facebook or Twitter account requires additional fields to support each network.
Additionally, MongoDB’s dynamic data model ensures that The Guardian will be able to extend the identity system over time. “MongoDB allows us to create a system that we can shape ourselves, with a view to the future of new ways for users to interact that we may not even know yet,” said Wills. The identity system is a central component that will be re-used by virtually all new features added to The Guardian’s site. It must be easily extendible by agile teams working on a variety of different application components.
JSON DATA MODEL
The MongoDB document structure intuitively represents The Guardian’s data and is easy to manage. MongoDB’s dynamic schema allows them to easily access documents with radically different structures at the same time, and have the ability to change that structure over time. They’ve also gained unmatched flexibility around various elements, such as ways to store documents, read/write to documents and manage consistencies of writes. This flexibility is crucial to the rapid rollout of new site features without completely rebuiding the identity system each time a new field must be added.
AD HOC QUERYING WITHOUT PRE-BUILT INDEX
MongoDB’s query language makes it easy to access JSON data. Compared to other NoSQL stores which only provide limited key value APIs, MongoDB’s rich query language is similar to the capabilities of an RDBMS without the hassles of creating complex joins. This enables The Guardian to add indexes and query the data in new ways that they hadn’t foreseen.
Compared to The Guardian’s previous RDBMS solution, which required downtime when upgrading the schema, MongoDB’s dynamic schema makes it easy to push changes to the site without taking it down.
MongoDB enables The Guardian to stay ahead of the competition in the rapidly evolving media environment by powering highly customized, social conversations throughout the site. They can now deliver interactive features to users more quickly, which translates to more revenue for The Guardian.
The Guardian will continue to rely on MongoDB to develop features and applications that support new opportunities for interacting with the site. For example, The Guardian is a pioneer in live blogging, where reporters cover a news event with feedback from users in real-time. They plan to use MongoDB to improve editorial workflow and build a semantically richer model.