relational database

2 results

Making sense of increased database choice

Gartner estimates that by 2015, 25 percent of new databases deployed will be of technologies supporting alternative data types and non-traditional data structures. This is great news, as these new database choices, many of them NoSQL, are generally better tuned to modern application requirements. The downside to this end to the “30-year old freeze,” to quote Redmonk analyst James Governor , is that with all these new options comes the risk of complicating a hitherto somewhat simple choice: which database to use? DB-Engines, after all, lists and ranks 92 different database systems , which doesn’t even include all of the NoSQL variants . Good luck to the CIO who tries to deploy all of those within her enterprise. The key, then, is to figure out how to standardize on a core of database technologies. Most companies will want to retain their legacy relational database for applications tuned to an RDBMS, or perhaps require complex transactions. But for most new applications, NoSQL databases like MongoDB will be be the optimal solution. But which one? There are currently at least 150 different NoSQL databases, split into different camps: document, columnar, key-value, graph, and others. One of my favorite guides for differentiating between these different options is Pramod Sadalage and Martin Fowler’s NoSQL Distilled . It does a great job of making NoSQL approachable, and also offers some guidance on which type of database to apply to specific types of problems. This is critical: which database is best largely depends on a particular use case. There is no shortage of guidance as to whether an enterprise should use NoSQL or stick with RDBMS or, if NoSQL, which to use ( here’s just one of many sites offering guidance). Unfortunately, this still doesn’t cut down on the number of choices presented to a developer interested in selecting a database for her application. I’m sure much of the advice is good, but it could end up solving a point problem (which database to use for a particular application) but exacerbate the meta problem (which databases to standardize on throughout the enterprise). This should be top-of-mind for every CIO, as shadow IT is already bringing NoSQL databases into the enterprise. This trend is only going to accelerate, as InfoWorld’ s Bob Lewis notes . The reasons NoSQL technologies are being adopted into the enterprise are somewhat similar to the reasons shadow IT is embracing the public cloud: speed of development, ease of development, and suitability for modern applications, as a recent Forrester survey found : Hence, savvy CIOs will select a few, broadly applicable databases that can tackle the vast majority of enterprise needs, while simultaneously satiating developers’ needs for databases that help them get their work done. But, again, which ones? Most enterprises already have RDBMS preferences, standardizing on two and possibly three SQL databases. Part of the reason that these databases have served so many for so long is that they are general purpose databases. They might not be the absolute perfect solution to a particular application requirement, but they do the job well enough and help the enterprise focus its resources. When choosing a NoSQL database, and every enterprise is going to need to do this, it’s important to opt for NoSQL databases that solve a wide variety of problems, rather than addressing niche requirements with a narrowly-applicable database. Document data stores like MongoDB tend to be the most broadly applicable, able to tackle a wide array of workloads. But there are other NoSQL databases that while not as generally useful, do a few things really well and should be considered. Other things to consider in settling on database standards are political and cultural issues, compatibility with existing applications or applications on the near- and long-term roadmap, and the momentum behind a particular NoSQL database. With 150-plus NoSQL databases to choose from, picking a fashionable but ephemeral database is a recipe for frustration and failure. As I’ve written, MongoDB’s community size and momentum , among other things, suggests it will be around for a long, long time. But there are other NoSQL communities that also demonstrate staying power. No enterprise wants to be managing dozens of databases, or even 10. Ideally, enterprises will settle on a few. Perhaps five, at most. In so doing, they should look to augment their RDBMS standards with NoSQL databases that are general purpose in nature, and broadly adopted. Considered in this light, NoSQL database standardization becomes much more manageable. — Posted by Matt Asay, vice president of Corporate Strategy . Tagged with: MongoDB, NoSQL, RDBMS, choice, database, relational database, Forrester, standardization, InfoWorld, shadow IT, Matt Asay

December 19, 2012

Case Study: The New York Times Runs MongoDB

Perhaps your business has settled on the exact right operating model, one that will remain static for years, if not decades. But for the 99.999 percent of the rest of the world’s enterprises, your market is in a constant state of flux, demanding constant iterations on how you do business. As the Research & Development group of The New York Times Company (NYT) has found , a key way to confront the constant flux of today’s businesses is to build upon a flexible data infrastructure like MongoDB. The story behind theThe New York Times Company’s use of MongoDB isn’t new. Data scientist and then NYT employee Jake Porway spoke in June 2011 about how the media giant uses MongoDB in Project Cascade, a visualization tool that uses MongoDB to store and manage data about social sharing activity related to NYT content. But what is perhaps new is the more recent realization of just how critical it is to build upon flexible data infrastructure like MongoDB in our ever-changing business climate. Project Cascade visualizes the conversations happening aroundNYT content on Twitter, giving insight into which content is hot and who is fanning the flames. Joab Jackson, writing for PCWorld , has a great write-up, and you can also see an online demo . For the NYT, as Porway explains, [Project Cascade] allows us to [answer] questions that are really big, like what is the best time of day to tweet? What kinds of tweets get people involved? Is it more important for our automated feeds to tweet, or for our journalists? Imagine, however, that the Times editors determine they actually need to be collecting different data. With a relational database, this would involve a fair amount of bother, but for the NYT’s R&D team, it’s simply a matter of tweaking MongoDB’s data model. As Porway notes , “We can't bother futzing with RDBMS schemas when we're constantly changing what we want to look at.” The NYT started Project Cascade with just two weeks of data using just a single MongoDB instance and no replication. Even in this limited snapshot of the roughly 600 pieces of posted content and 25,000 Twitter links each day, Project Cascade was generating 100 GB of MongoDB storage each month. Fast forward to late 2011, and Project Cascade is in serious production, processing 100,000 tweets (and far more clicks) daily, all in real-time. This necessitated moving up to a four-node MongoDB replica set, but it didn’t involve adding the complexity of joins or other characteristics of a relational database. As Deep Kapadia, Technical Program Manager at The New York Times Company, says , “MongoDB allows us to prototype things very quickly.” This is important for any enterprise application, as it allows companies to iterate around their data. Most won’t know exactly what their data model should look like right from the start. The NYT certainly didn’t. As Kapadia explains, the NYT didn’t have to do any schema design upfront to determine which fields to capture from Twitter or Bit.ly, but could simply dump all the data into MongoDB and figure out how to process it later. That flexibility is powerful. Granted, not all businesses will want to change as often as the NYT’s research group, but in a world of accelerating change, it’s increasingly critical that companies don’t hard-code rigid schemas into their data infrastructure. It’s also important that enterprises look to the future. However small a project starts, Big Data looms. Porway explains, “Even if we're not dealing with big data when we start a project, the data demands can rise significantly.” A RDBMS scale-up strategy quickly becomes expensive and constrictive. A NoSQL scale-out architecture is much more forgiving. MongoDB is particularly useful as it runs as well on a single node as it does on hundreds of nodes. Scale almost always starts with one node, as Foursquare and others have found . While the Web companies like Google and Twitter ran into the constraints of RDBMS technology first, mainstream enterprises are hitting them now. The New York Times has been publishing continuously since 1851, yet the nature of its business has changed significantly since the advent of the Internet. The same is true for most businesses. Like NYT, most mainstream enterprises today will find themselves collecting, filtering, and analyzing realtime data feeds from a variety of sources to better understand how customers and prospects interact with their products and services. MongoDB fits perfectly in this kind of ever-changing world. Not surprisingly, the publishing and media world is grappling with the need for flexible data models in a very public way. Like the NYT, UK-based news publisher The Guardian also uses MongoDB to help it adapt to digital and the business models enabled by it. In order to flexibly iterate on different user engagement models, The Guardian had to drop old-school relational database technology and move to MongoDB. Not that MongoDB is perfect. As Kapadia highlighted roughly a year after Porway’s original presentation, there is definitely a science to deploying MongoDB effectively. It’s very easy to get started with MongoDB, but it requires the same level of care that any critical data infrastructure does. If Tim O’Reilly is right and “ Data is the new Intel Inside ,” then it’s important to build applications on a flexible database that not only can scale to collect increasing quantities of data, but also affords the agility to change one’s data model as business needs change. Data offer real competitive advantage to the companies prepared to leverage them. Just ask The New York Times. Tagged with: case study, The New York Times, The Guardian, flexibility, agility, publishing, media, MongoDB, RDBMS, relational database, Jake Porway

December 17, 2012