The DBA's future as the Database Adviser
With such a large shift towards NoSQL technologies like MongoDB, there is a lot of discussion about the changing role of the Database Administrator (DBA). Many go so far as to say DBAs are no longer needed , an idea driven by new database capabilities that focus on agility with a dynamic schema instead of a fixed schema and features that make operational management easier in areas like high availability and horizontal scaling. Based on my experience working with hundreds of customers to implement MongoDB in their organizations, there is still room for the DBA, even if their everyday tasks might take less time with MongoDB. First, let's talk about the parts of their role that stay the same. There still needs to be someone to set up back-up/recovery processes, handle capacity planning, run maintenance tasks (e.g. upgrades), diagnose issues, do configuration management, and set up replication and sharding. In enterprises, often separate operational teams handle security, monitoring, and diagnosing common issues but that could be part of the DBA's role in some firms as well. Notice I left out schema management; in MongoDB, the implementation of the schema is NOT predefined but is rather determined at the application level and the structure of the object is stored in the application team’s favorite programming language. This is incredibly valuable when business units request new features or data be added to the application. The application developer can simply add it in, and no change needs to be made in the database, enabling rapid agility. In this case, the DBA is no longer the middleman, and can focus on keeping the database up and running. But in some cases, application changes are more complex, and development teams need a database expert to consult about schema design and its impact on performance, maintainability, and other factors. In this case, the role of the DBA transforms into the DB Adviser. In this role, the DB Advisor would work closely with the application development team that implements the schema. The DBA provides the process and due diligence to manage the pace of change rather than enforce the limitations of the relational schema. Implementing the schema in the application might be disconcerting to some DBAs and others out there who rightfully worry about application developers making uninformed decisions and bringing down an application. Letting go of schema management might be a difficult step, but DBAs rely on application developers to make decisions in their application code on a regular basis. Creating a poorly constructed algorithm will bring the system to a crawl and querying fields in unexpected ways would cause performance problems as well. Is choosing schema design in the database so much more responsibility than any other development decision? The DBA should still guide developers, but without the enormous overhead of having to always update a relationship database schema to simply add a field! Your schema should be as dynamic as your business: agile with the optimal amount of control. The DBA has always been the person accountable to ensure the reliability and performance of the database alongside development teams. However, in today's market, the DBA will continue performing most of the activities they do in the former relational-only world, but will advise in other areas to allow their business groups to innovate and iterate faster than their competition. In this shift, the DBAs can take on new roles to help their businesses achieve more, faster. See how you can help your organizations get faster, better and leaner with MongoDB .
Why MongoDB Is Popular
There are many reasons that MongoDB is the most popular non-relational database by far, but one reason stands out across the broad spectrum of customers and users : Agility . I suppose there once was a time when it was acceptable to take months (or years) planning out an application and its associated data schema, building it and then resisting any efforts to update it (because the data infrastructure was so calcified that change was painful if not impossible). We don't live in that time anymore. Particularly in this age of Big Data , we must constantly iterate on our applications as we hone the types of data we're collecting and deploying to improve our user experiences. This is hard to do with a relational database. It's like trying to win the World Series with the Kansas City Royals. Or the English Premiership with Stoke City. It's possible, but unlikely. Iteration is critical to satisfying customers and adapting fast enough to win markets, as noted on the Wide Awake Developers blog : Iteration is [a] fundamental dynamic[[. Iteration facilitates adaptation, and adaptation wins competition. History is littered with the carcasses of "superior" contenders that simply didn't adapt as fast as their victorious challengers. MongoDB enables such iteration. More than any other NoSQL database, and dramatically more than any relational database, MongoDB's document-oriented data model makes it exceptionally easy to add or change fields, among other things. So if a developer needs to quickly evolve an application, MongoDB's flexible data model facilitates this. Rather than fitting an application to meet schema requirements, the developer writes her application and the schema follows. Form follows function in MongoDB, as it were. Yes, MongoDB is popular because it's easy to learn and get started. Yes, it's highly scalable (auto-sharding, anyone?), cost effective and more. But the biggest reason MongoDB is wildly popular, in my experience? Because MongoDB enables profound developer agility through its flexible data model.
Case Study: The New York Times Runs MongoDB
Perhaps your business has settled on the exact right operating model, one that will remain static for years, if not decades. But for the 99.999 percent of the rest of the world’s enterprises, your market is in a constant state of flux, demanding constant iterations on how you do business. As the Research & Development group of The New York Times Company (NYT) has found , a key way to confront the constant flux of today’s businesses is to build upon a flexible data infrastructure like MongoDB. The story behind theThe New York Times Company’s use of MongoDB isn’t new. Data scientist and then NYT employee Jake Porway spoke in June 2011 about how the media giant uses MongoDB in Project Cascade, a visualization tool that uses MongoDB to store and manage data about social sharing activity related to NYT content. But what is perhaps new is the more recent realization of just how critical it is to build upon flexible data infrastructure like MongoDB in our ever-changing business climate. Project Cascade visualizes the conversations happening aroundNYT content on Twitter, giving insight into which content is hot and who is fanning the flames. Joab Jackson, writing for PCWorld , has a great write-up, and you can also see an online demo . For the NYT, as Porway explains, [Project Cascade] allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists? Imagine, however, that the Times editors determine they actually need to be collecting different data. With a relational database, this would involve a fair amount of bother, but for the NYT’s R&D team, it’s simply a matter of tweaking MongoDB’s data model. As Porway notes , “We can't bother futzing with RDBMS schemas when we're constantly changing what we want to look at.” The NYT started Project Cascade with just two weeks of data using just a single MongoDB instance and no replication. Even in this limited snapshot of the roughly 600 pieces of posted content and 25,000 Twitter links each day, Project Cascade was generating 100 GB of MongoDB storage each month. Fast forward to late 2011, and Project Cascade is in serious production, processing 100,000 tweets (and far more clicks) daily, all in real-time. This necessitated moving up to a four-node MongoDB replica set, but it didn’t involve adding the complexity of joins or other characteristics of a relational database. As Deep Kapadia, Technical Program Manager at The New York Times Company, says , “MongoDB allows us to prototype things very quickly.” This is important for any enterprise application, as it allows companies to iterate around their data. Most won’t know exactly what their data model should look like right from the start. The NYT certainly didn’t. As Kapadia explains, the NYT didn’t have to do any schema design upfront to determine which fields to capture from Twitter or Bit.ly, but could simply dump all the data into MongoDB and figure out how to process it later. That flexibility is powerful. Granted, not all businesses will want to change as often as the NYT’s research group, but in a world of accelerating change, it’s increasingly critical that companies don’t hard-code rigid schemas into their data infrastructure. It’s also important that enterprises look to the future. However small a project starts, Big Data looms. Porway explains, “Even if we're not dealing with big data when we start a project, the data demands can rise significantly.” A RDBMS scale-up strategy quickly becomes expensive and constrictive. A NoSQL scale-out architecture is much more forgiving. MongoDB is particularly useful as it runs as well on a single node as it does on hundreds of nodes. Scale almost always starts with one node, as Foursquare and others have found . While the Web companies like Google and Twitter ran into the constraints of RDBMS technology first, mainstream enterprises are hitting them now. The New York Times has been publishing continuously since 1851, yet the nature of its business has changed significantly since the advent of the Internet. The same is true for most businesses. Like NYT, most mainstream enterprises today will find themselves collecting, filtering, and analyzing realtime data feeds from a variety of sources to better understand how customers and prospects interact with their products and services. MongoDB fits perfectly in this kind of ever-changing world. Not surprisingly, the publishing and media world is grappling with the need for flexible data models in a very public way. Like the NYT, UK-based news publisher The Guardian also uses MongoDB to help it adapt to digital and the business models enabled by it. In order to flexibly iterate on different user engagement models, The Guardian had to drop old-school relational database technology and move to MongoDB. Not that MongoDB is perfect. As Kapadia highlighted roughly a year after Porway’s original presentation, there is definitely a science to deploying MongoDB effectively. It’s very easy to get started with MongoDB, but it requires the same level of care that any critical data infrastructure does. If Tim O’Reilly is right and “ Data is the new Intel Inside ,” then it’s important to build applications on a flexible database that not only can scale to collect increasing quantities of data, but also affords the agility to change one’s data model as business needs change. Data offer real competitive advantage to the companies prepared to leverage them. Just ask The New York Times. Tagged with: case study, The New York Times, The Guardian, flexibility, agility, publishing, media, MongoDB, RDBMS, relational database, Jake Porway