MongoDB Connector for Hadoop Now Certified with Top 3 Hadoop Vendors
We’re excited to announce that our MongoDB Connector for Hadoop has just been certified on MapR’s latest distribution, 4.0.1 . The connector, which allows customers to use MongoDB as an input source and/or output destination for Hadoop deployments, is now certified on distributions from all of the leading vendors in the space, including MapR , Hortonworks , and Cloudera . As an operational database for use cases such as Single View , Internet of Things , Real-Time Analytics , and more , MongoDB is the perfect technology complement to Hadoop. With the connector, live data from MongoDB can be brought into Hadoop, enriched through analytics (often with data from other sources), and then passed back into MongoDB to better serve user-facing applications. Orbitz, the travel booking company, uses MongoDB and Hadoop together to deliver real-time pricing and compete for travel shoppers. MongoDB serves as the data collector while Hadoop is used to store and analyze the data. The City of Chicago built a futuristic predictive analytics platform using MongoDB and Hadoop Their WindyGrid system allows officials to access a real-time view into crime, public health and other citizen issues. Data analysis allows the city to predict disease outbreaks and decide in real-time where to place first responders. Other Common Use Cases That Leverage MongoDB + Hadoop Ecommerce MongoDB can be used to... Hadoop can be used to… Store products, inventory, customer profiles, clickstream data Run real-time recommendations Session management Detect Fraud Store complete transaction history, and clickstream history Build recommendation model and fraud detection models Insurance MongoDB can be used to... Hadoop can be used to… Store insurance policies, customer web data, call center data, demographic data Real-time churn detection Conduct customer action analysis Create churn prediction algorithms Learn more about how Hadoop and MongoDB can work together [here](http://www.mongodb.com/hadoop-and-mongodb). What’s next? Get started by checking out the documentation on the MongoDB Connector for Hadoop or learn more at one of our upcoming MongoDB Days: MongoDB London , 11/6; MongoDB Munich , 11/12; MongoDB Paris , 11/18; MongoDB Beijing , 11/22; and MongoDB SF , 12/3.
Mapping the Industry's Tectonic Shift in Data Management
We are clearly in the early stages of a "tectonic shift" in the database market, as eWeek terms it . Not because any particular database vendor decided that the world was ripe for a change, but because the nature of data we're generating and processing has changed. Dramatically. In a recent research note, Cowen & Co. analyst Peter Goldmacher clearly articulates this shift: It is well understood that the current database giants have written superb products to solve primarily one problem (automating standard business processes), but we no longer live in a one problem world. The proliferation of mobile devices is forcing an immense structural change as we increasingly overlay a digital existence on top of our analog existence. If we can measure it, we can manage it; has transcended the world of business process automation and now has meaning in everything we do, as everything we do generates data. Driving, tweeting, gaming, friending, browsing, walking...it all generates data. We can capture, analyze and derive tremendous value from that data, but only if we can use low cost, high-quality data management products. This is the challenge MongoDB is laying down, and it is the challenge all other data management players must rise to meet if Big Data is going to realize its potential. I've called out before that NoSQL and Hadoop are the new normal in data management. This is why. And it's why as much as the RDBMS establishment may wish it otherwise, the industry looks bright for NoSQL technologies like MongoDB.
The Changing Of The Technology Guard: NoSQL + Hadoop
Big Data truly is prompting a changing of the technology guard. In an excellent article today, The Wall Street Journal notes that Hadoop is "challenging tech heavyweights like Oracle and Teradata [whose] core database technology is too expensive and ill-suited for typical big data tasks." This follows my own observations that repeated earnings misses across the legacy technology vendor landscape indicate that real, tectonic shifts in the technology landscape are underway. In other words, NoSQL and Hadoop are the new normal. What the Journal missed, however, was the right emphasis. As fantastic as Hadoop is, it's only one part of the Big Data story. And not necessarily the most significant part. For example, the Journal writes: Traditional databases organize easy-to-categorize information. Customer records or ATM transactions, for example, arrive in a predefined format that is easy to process and analyze. These so-called relational databases are the kind offered by Oracle and Teradata among others, and the market for them runs to an estimated $30 billion a year, according to IDC estimates. The Internet, though, is messy. Companies now also have to make sense of and store the mass of data being generated from tweets, Web-surfing logs and Internet-connected machines. Hadoop is a cheap technology to make that possible, and it was born of Google technologies detailed in academic papers. The article is dead-on in most respects, except for the market that Hadoop truly tackles. Of the $30 billion database market, Hadoop addresses just a quarter of it: the OLAP market. The much larger market is the traditional OLTP market, and this is the home of NoSQL databases like MongoDB. Perhaps unsurprisingly, then, MongoDB has the fastest growing Big Data community , and the second hottest job trend after only HTML5 . Big Data, after all, isn't merely about analytics. It's primarily about operational databases that can help enterprises put their data to work in real time.
Why Open Source Is Essential To Big Data
Gartner analyst Merv Adrian recently highlighted some of the recent movements in Hadoop Land, with several companies introducing products "intended to improve Hadoop speed." This seems odd, as that wouldn't be my top pick for how to improve Hadoop or, really, most of the Big Data technologies out there. By many accounts, the biggest need in Hadoop is improved ease of use, not improved performance, something Adrian himself confirms : Hadoop already delivers exceptional performance on commodity hardware, compared to its stodgy proprietary competition. Where it's still lacking is in ease of use. Not that Hadoop is alone in this. As Mare Lucas asserts , Today, despite the information deluge, enterprise decision makers are often unable to access the data in a useful way. The tools are designed for those who speak the language of algorithms and statistical analysis. It’s simply too hard for the everyday user to “ask” the data any questions – from the routine to the insightful. The end result? The speed of big data moves at a slower pace … and the power is locked in the hands of the few. Lucas goes on to argue that the solution to the data scientist shortage is to take the science out of data science; that is, consumerize Big Data technology such that non-PhD-wielding business people can query their data and get back meaningful results. The Value Of Open Source To Deciphering Big Data Perhaps. But there's actually an intermediate step before we reach the Promised Land of full consumerization of Big Data. It's called open source. Even with technology like Hadoop that is open source yet still too complex, the benefits of using Hadoop far outweigh the costs (financial and productivity-wise) associated with licensing an expensive data warehousing or analytics platform. As Alex Popescu writes , Hadoop "allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility." But these benefits aren't unique to Hadoop. They're inherent in any open-source project. Now imagine we could get open-source software that fits our Big Data needs and is exceptionally easy to use plus is almost certainly already being used within our enterprises...? That is the promise of MongoDB, consistently cited as one of the industry's top-two Big Data technologies . MongoDB makes it easy to get started with a Big Data project. Using MongoDB To Innovate Consider the City of Chicago. The Economist wrote recently about the City of Chicago's predictive analytics platform, WindyGrid. What The Economist didn't mention is that WindyGrid started as a pet project on chief data officer Brett Goldstein's laptop. Goldstein started with a single MongoDB node, and iterated from there, turning it into one of the most exciting data-driven applications in the industry today. Given that we often don't know exactly which data to query, or how to query, or how to put data to work in our applications, this is precisely how a Big Data project should work. Start small, then iterate toward something big. This kind of tinkering simply is difficult to impossible with a relational database, as The Economist's Kenneth Cukier points out in his book, Big Data: A Revolution That Will Transform How We Live, Work, and Think : Conventional, so-called relational, databases are designed for a world in which data is sparse, and thus can be and will be curated carefully. It is a world in which the questions one wants to answer using the data have to be clear at the outset, so that the database is designed to answer them - and only them - efficiently. But with a flexible document database like MongoDB, it suddenly becomes much easier to iterate toward Big Data insights. We don't need to go out and hire data scientists. Rather, we simply need to apply existing, open-source technology like MongoDB to our Big Data problems, which jibes perfectly with Gartner analyst Svetlana Sicular's mantra that it's easier to train existing employees on Big Data technologies than it is to train data scientists on one's business. Except, in the case of MongoDB, odds are that enterprises are already filled with people that understand MongoDB, as 451 Research's LinkedIn analysis suggests: In sum, Big Data needn't be daunting or difficult. It's a download away.
Top Big Data skills? MongoDB and Hadoop
According to new research from the UK’s Sector Skills Council for Business and Information Technology, the organization responsible for managing IT standards and qualifications, Big Data is a big deal in the UK, and MongoDB is one of the top Big Data skills in demand. This meshes with SiliconAngle Wikibon research I highlighted earlier, detailing Hadoop and MongoDB as the top-two Big Data technologies. It also jibes with JasperSoft data that shows MongoDB as one of its top Big Data connectors: MongoDB is a fantastic operational data store. As soon as one remembers that Big Data is a question of both storage and processing, it makes sense that the top operational data store would be MongoDB, given its flexibility and scalability. Foursquare is a great example of a customer using MongoDB in this way. On the data processing side, a growing number of enterprises use MongoDB both to store and process log data, among other data analytics workloads. Some use MongoDB with its built-in MapReduce functionality, while others choose to use the Hadoop connector or MongoDB’s Aggregation Framework to avoid MapReduce. Whatever the method or use case, the great thing about Big Data technologies like MongoDB and Hadoop is that they’re open source, so the barriers to download, learn, and adopt them are negligible. Given the huge demand for Big Data skills, both in the UK and globally, according to data from Dice and Indeed.com , it’s time to download MongoDB and get started on your next Big Data project. Tagged with: MongoDB, Hadoop, Big Data, open source, operational database, Foursquare, IT jobs, jobs
Chicago looks to cut crime with MongoDB
The true value of a technology isn’t how rich it makes a vendor, but rather how productive and happy it makes a user. Of course it’s cool for 10gen to be selected from a pool of 5,900 U.S. startups and ranked as the top U.S. software startup , if for no other reason than to earn me bragging rights with my mom. But while that may be interesting for 10gen employees and investors (and my mom), it’s nowhere near as cool as having the City of Chicago, for example, build an exceptionally innovative crime prevention tool using MongoDB, as The Wall Street Journal recently reported . The City of Chicago simply could not create its analytics platform with a relational database. Not easily, anyway. The City needed the ability to marry structured and unstructured data, allowing City employees to combine disparate sources and types of data in order to glean insights. What does this mean? As but one example: [C]ity officials might look at a high crime area, while also mapping out the number of liquor permits for a neighborhood, along with the amount of nearby abandoned buildings. Using transcripts from resident complaints or 911 calls [or data from any number of 30 different City agencies or departments], officials could also see trending concerns for the area, like broken lights or stolen garbage cans, and the times the incidents are occurring. If the high crime area also has a high number of liquor permits, for example, officials could then see if other neighborhoods also faced both issues, allowing them to create a more effective response for those areas. I read that and feel positively giddy that modern technology makes this sort of thing possible. It’s even more exciting when you consider that the City of Chicago didn’t have to engage in protracted negotiations to use the technology. The City simply downloaded MongoDB and got started. This is great for 10gen. My mom has never been prouder of me. But it’s so much more important for the City of Chicago and other users looking to leverage NoSQL technology like MongoDB to solve Big Data and other problems. As Cowen & Co. analyst Peter Goldmacher recently wrote, We believe the biggest winners in the Big Data world aren't the Big Data technology vendors, but rather the companies that will leverage Big Data technology to create entirely new businesses or disrupt legacy businesses. Cloudera CEO (and my good friend) Mike Olson followed up on this report by concluding, “You want to get rich on Big Data? Use it!” He’s absolutely right. The real riches in open source, Big Data, mobile, etc. will not go to the vendors who develop, sell, and support these technologies. Sure, some will do well from these activities, but that’s not really the point. No, the real riches go to those who embrace and implement these technologies, whether Hadoop, Linux, MongoDB, or Storm. Which is, when you think about it, exactly as it should be. [Posted by Matt Asay, vice president of Corporate Strategy] Tagged with: City of Chicago, MongoDB, big data, analytics, hadoop, Cowen & Co., open source, nosql, Wall Street Journal