The Guardian

2 results

Case Study: The New York Times Runs MongoDB

Perhaps your business has settled on the exact right operating model, one that will remain static for years, if not decades. But for the 99.999 percent of the rest of the world’s enterprises, your market is in a constant state of flux, demanding constant iterations on how you do business. As the Research & Development group of The New York Times Company (NYT) has found , a key way to confront the constant flux of today’s businesses is to build upon a flexible data infrastructure like MongoDB. The story behind theThe New York Times Company’s use of MongoDB isn’t new. Data scientist and then NYT employee Jake Porway spoke in June 2011 about how the media giant uses MongoDB in Project Cascade, a visualization tool that uses MongoDB to store and manage data about social sharing activity related to NYT content. But what is perhaps new is the more recent realization of just how critical it is to build upon flexible data infrastructure like MongoDB in our ever-changing business climate. Project Cascade visualizes the conversations happening aroundNYT content on Twitter, giving insight into which content is hot and who is fanning the flames. Joab Jackson, writing for PCWorld , has a great write-up, and you can also see an online demo . For the NYT, as Porway explains, [Project Cascade] allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists? Imagine, however, that the Times editors determine they actually need to be collecting different data. With a relational database, this would involve a fair amount of bother, but for the NYT’s R&D team, it’s simply a matter of tweaking MongoDB’s data model. As Porway notes , “We can't bother futzing with RDBMS schemas when we're constantly changing what we want to look at.” The NYT started Project Cascade with just two weeks of data using just a single MongoDB instance and no replication. Even in this limited snapshot of the roughly 600 pieces of posted content and 25,000 Twitter links each day, Project Cascade was generating 100 GB of MongoDB storage each month. Fast forward to late 2011, and Project Cascade is in serious production, processing 100,000 tweets (and far more clicks) daily, all in real-time. This necessitated moving up to a four-node MongoDB replica set, but it didn’t involve adding the complexity of joins or other characteristics of a relational database. As Deep Kapadia, Technical Program Manager at The New York Times Company, says , “MongoDB allows us to prototype things very quickly.” This is important for any enterprise application, as it allows companies to iterate around their data. Most won’t know exactly what their data model should look like right from the start. The NYT certainly didn’t. As Kapadia explains, the NYT didn’t have to do any schema design upfront to determine which fields to capture from Twitter or Bit.ly, but could simply dump all the data into MongoDB and figure out how to process it later. That flexibility is powerful. Granted, not all businesses will want to change as often as the NYT’s research group, but in a world of accelerating change, it’s increasingly critical that companies don’t hard-code rigid schemas into their data infrastructure. It’s also important that enterprises look to the future. However small a project starts, Big Data looms. Porway explains, “Even if we're not dealing with big data when we start a project, the data demands can rise significantly.” A RDBMS scale-up strategy quickly becomes expensive and constrictive. A NoSQL scale-out architecture is much more forgiving. MongoDB is particularly useful as it runs as well on a single node as it does on hundreds of nodes. Scale almost always starts with one node, as Foursquare and others have found . While the Web companies like Google and Twitter ran into the constraints of RDBMS technology first, mainstream enterprises are hitting them now. The New York Times has been publishing continuously since 1851, yet the nature of its business has changed significantly since the advent of the Internet. The same is true for most businesses. Like NYT, most mainstream enterprises today will find themselves collecting, filtering, and analyzing realtime data feeds from a variety of sources to better understand how customers and prospects interact with their products and services. MongoDB fits perfectly in this kind of ever-changing world. Not surprisingly, the publishing and media world is grappling with the need for flexible data models in a very public way. Like the NYT, UK-based news publisher The Guardian also uses MongoDB to help it adapt to digital and the business models enabled by it. In order to flexibly iterate on different user engagement models, The Guardian had to drop old-school relational database technology and move to MongoDB. Not that MongoDB is perfect. As Kapadia highlighted roughly a year after Porway’s original presentation, there is definitely a science to deploying MongoDB effectively. It’s very easy to get started with MongoDB, but it requires the same level of care that any critical data infrastructure does. If Tim O’Reilly is right and “ Data is the new Intel Inside ,” then it’s important to build applications on a flexible database that not only can scale to collect increasing quantities of data, but also affords the agility to change one’s data model as business needs change. Data offer real competitive advantage to the companies prepared to leverage them. Just ask The New York Times. Tagged with: case study, The New York Times, The Guardian, flexibility, agility, publishing, media, MongoDB, RDBMS, relational database, Jake Porway

December 17, 2012

App development too slow? Fire your infrastructure

Forrester analyst Mike Gualtieri has argued (PDF) that “Many CIOs are on the hot seat to innovate by delivering increasingly critical applications more quickly,” yet don’t know how to do this. Part of the problem stems a shortage of staff that can keep up with requests for new applications. But perhaps a bigger problem is the crufty data infrastructure that makes it difficult to develop and improve applications. Solving this second problem (infrastructure/tools) may be the key to solving the first (personnel shortage). Throwing more bodies into an inefficient system doesn’t make it more efficient. It’s not surprising, therefore, that after interviewing a range of CIOs, Gualtieri concludes: Traditional application development platforms such as Java and .NET are not necessarily the fastest approaches to develop applications. CIOs should investigate application development productivity platforms that make application development professionals more productive. This parallels something my colleague, Jared Rosoff, said recently to me: The key to rapid development is looking at the whole stack of tools and processes that deliver software. You need to throw out the things that further a ‘design it, build it, ship it’ mentality and switch to things that encourage an ‘iterate it’ mentality. This means thinking of continuous deployment instead of shipping code, dynamic schemas instead of rigid schemas, and so on.” Unfortunately for the CIO, her enterprise is not going to sit around waiting for her to solve this. Cloud has changed expectations for business users, who have learned from Salesforce and its peers that robust business functionality doesn’t need to wait on IT to provision servers or install software. Nor are the advantages of cloud lost on IT. DevOps is but one example of IT buying into this “magic layer,” as Wipro vice president K.D. Singh terms it , enabling developers to “scrunch[] development cycles and improv[e] quality by fusing development and operations activities (and integrating testing between the two functions).” Increasingly such developers also turn to a new breed of development languages like Ruby, or application frameworks like Django or new market entrant Meteor . They’re writing less code and getting more done. And they’re asking the CIO for forgiveness, not permission. …Unless, of course, they’re shackled to 20th Century database technology. The traditional data infrastructure layer is a primary inhibitor to application flexibility. Relational databases served us well for many years, and still play a critical role for a certain class of application, but they’re a poor fit for modern application development. Take, for example, The Guardian , one of the UK’s leading newspapers. As the world has moved online, The Guardian was looking for ways to maximize user engagement, given its positive impact on revenue. It needed a new user identity system that could be tweaked and improved, and a traditional RDBMS just didn’t fit, as Philip Wills, software architect at guardian.co.uk, highlights : Relational databases have a sound approach, but that doesn't necessarily match the way we see our data. MongoDB gave us the flexibility to store data in the way that we understand it as opposed to somebody's theoretical view. Importantly, this wasn’t a one-time decision on the data model. Wills continues: ...MongoDB allows us to create a system that we can shape ourselves, with a view to the future of new ways for users to interact that we may not even know yet.” The reality is that most applications today, and the kinds of information they gather and deploy, depend upon a flexible data model that doesn’t constrain a developer unnecessarily. We’re entering a more rational world where the database structure is more fluid, changing as needs change. This is the world of NoSQL. NoSQL databases like MongoDB are not necessarily “schema-less,” but rather offer a great deal of flexibility around schema design, which in turn allows developers to change their schemas to reflect changes in their applications and the kind of information they’re trying to capture or exploit. In other words, they’re a great way to future-proof application development within the enterprise. This isn’t to suggest that most enterprises should rip out their existing data infrastructure. Yes, at 10gen we see organizations do just that (sometimes very large organizations for mission-critical applications), but this generally won’t be the preferred path for developers or their CIOs. They’ll want to maintain what they have while building for the future. As such, we’re seeing new app development gravitating to the platforms and processes that enable choice, rather than lessen it. We’re in the early days of NoSQL, but the momentum is already big and accelerating. So, want to speed up your company’s ability to deliver applications faster? That’s easy. Fire your data infrastructure and build flexibility and choice into your next-generation application development. Building on MongoDB is a great way to accomplish this. <b>Posted by Matt Asay, vice president of Corporate Strategy</b> Tagged with: CIO, DevOps, MongoDB, app development, Forrester, Mike Gualtieri, django, ruby, Meteor, The Guardian

October 26, 2012