MongoDB Connector for Hadoop Now Certified with Top 3 Hadoop Vendors
We’re excited to announce that our MongoDB Connector for Hadoop has just been certified on MapR’s latest distribution, 4.0.1 . The connector, which allows customers to use MongoDB as an input source and/or output destination for Hadoop deployments, is now certified on distributions from all of the leading vendors in the space, including MapR , Hortonworks , and Cloudera . As an operational database for use cases such as Single View , Internet of Things , Real-Time Analytics , and more , MongoDB is the perfect technology complement to Hadoop. With the connector, live data from MongoDB can be brought into Hadoop, enriched through analytics (often with data from other sources), and then passed back into MongoDB to better serve user-facing applications. Orbitz, the travel booking company, uses MongoDB and Hadoop together to deliver real-time pricing and compete for travel shoppers. MongoDB serves as the data collector while Hadoop is used to store and analyze the data. The City of Chicago built a futuristic predictive analytics platform using MongoDB and Hadoop Their WindyGrid system allows officials to access a real-time view into crime, public health and other citizen issues. Data analysis allows the city to predict disease outbreaks and decide in real-time where to place first responders. Other Common Use Cases That Leverage MongoDB + Hadoop Ecommerce MongoDB can be used to... Hadoop can be used to… Store products, inventory, customer profiles, clickstream data Run real-time recommendations Session management Detect Fraud Store complete transaction history, and clickstream history Build recommendation model and fraud detection models Insurance MongoDB can be used to... Hadoop can be used to… Store insurance policies, customer web data, call center data, demographic data Real-time churn detection Conduct customer action analysis Create churn prediction algorithms Learn more about how Hadoop and MongoDB can work together [here](http://www.mongodb.com/hadoop-and-mongodb). What’s next? Get started by checking out the documentation on the MongoDB Connector for Hadoop or learn more at one of our upcoming MongoDB Days: MongoDB London , 11/6; MongoDB Munich , 11/12; MongoDB Paris , 11/18; MongoDB Beijing , 11/22; and MongoDB SF , 12/3.
Mapping the Industry's Tectonic Shift in Data Management
We are clearly in the early stages of a "tectonic shift" in the database market, as eWeek terms it . Not because any particular database vendor decided that the world was ripe for a change, but because the nature of data we're generating and processing has changed. Dramatically. In a recent research note, Cowen & Co. analyst Peter Goldmacher clearly articulates this shift: It is well understood that the current database giants have written superb products to solve primarily one problem (automating standard business processes), but we no longer live in a one problem world. The proliferation of mobile devices is forcing an immense structural change as we increasingly overlay a digital existence on top of our analog existence. If we can measure it, we can manage it; has transcended the world of business process automation and now has meaning in everything we do, as everything we do generates data. Driving, tweeting, gaming, friending, browsing, walking...it all generates data. We can capture, analyze and derive tremendous value from that data, but only if we can use low cost, high-quality data management products. This is the challenge MongoDB is laying down, and it is the challenge all other data management players must rise to meet if Big Data is going to realize its potential. I've called out before that NoSQL and Hadoop are the new normal in data management. This is why. And it's why as much as the RDBMS establishment may wish it otherwise, the industry looks bright for NoSQL technologies like MongoDB.
The Changing Of The Technology Guard: NoSQL + Hadoop
Big Data truly is prompting a changing of the technology guard. In an excellent article today, The Wall Street Journal notes that Hadoop is "challenging tech heavyweights like Oracle and Teradata [whose] core database technology is too expensive and ill-suited for typical big data tasks." This follows my own observations that repeated earnings misses across the legacy technology vendor landscape indicate that real, tectonic shifts in the technology landscape are underway. In other words, NoSQL and Hadoop are the new normal. What the Journal missed, however, was the right emphasis. As fantastic as Hadoop is, it's only one part of the Big Data story. And not necessarily the most significant part. For example, the Journal writes: Traditional databases organize easy-to-categorize information. Customer records or ATM transactions, for example, arrive in a predefined format that is easy to process and analyze. These so-called relational databases are the kind offered by Oracle and Teradata among others, and the market for them runs to an estimated $30 billion a year, according to IDC estimates. The Internet, though, is messy. Companies now also have to make sense of and store the mass of data being generated from tweets, Web-surfing logs and Internet-connected machines. Hadoop is a cheap technology to make that possible, and it was born of Google technologies detailed in academic papers. The article is dead-on in most respects, except for the market that Hadoop truly tackles. Of the $30 billion database market, Hadoop addresses just a quarter of it: the OLAP market. The much larger market is the traditional OLTP market, and this is the home of NoSQL databases like MongoDB. Perhaps unsurprisingly, then, MongoDB has the fastest growing Big Data community , and the second hottest job trend after only HTML5 . Big Data, after all, isn't merely about analytics. It's primarily about operational databases that can help enterprises put their data to work in real time.
Why Open Source Is Essential To Big Data
Gartner analyst Merv Adrian recently highlighted some of the recent movements in Hadoop Land, with several companies introducing products "intended to improve Hadoop speed." This seems odd, as that wouldn't be my top pick for how to improve Hadoop or, really, most of the Big Data technologies out there. By many accounts, the biggest need in Hadoop is improved ease of use, not improved performance, something Adrian himself confirms : Hadoop already delivers exceptional performance on commodity hardware, compared to its stodgy proprietary competition. Where it's still lacking is in ease of use. Not that Hadoop is alone in this. As Mare Lucas asserts , Today, despite the information deluge, enterprise decision makers are often unable to access the data in a useful way. The tools are designed for those who speak the language of algorithms and statistical analysis. It’s simply too hard for the everyday user to “ask” the data any questions – from the routine to the insightful. The end result? The speed of big data moves at a slower pace … and the power is locked in the hands of the few. Lucas goes on to argue that the solution to the data scientist shortage is to take the science out of data science; that is, consumerize Big Data technology such that non-PhD-wielding business people can query their data and get back meaningful results. The Value Of Open Source To Deciphering Big Data Perhaps. But there's actually an intermediate step before we reach the Promised Land of full consumerization of Big Data. It's called open source. Even with technology like Hadoop that is open source yet still too complex, the benefits of using Hadoop far outweigh the costs (financial and productivity-wise) associated with licensing an expensive data warehousing or analytics platform. As Alex Popescu writes , Hadoop "allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility." But these benefits aren't unique to Hadoop. They're inherent in any open-source project. Now imagine we could get open-source software that fits our Big Data needs and is exceptionally easy to use plus is almost certainly already being used within our enterprises...? That is the promise of MongoDB, consistently cited as one of the industry's top-two Big Data technologies . MongoDB makes it easy to get started with a Big Data project. Using MongoDB To Innovate Consider the City of Chicago. The Economist wrote recently about the City of Chicago's predictive analytics platform, WindyGrid. What The Economist didn't mention is that WindyGrid started as a pet project on chief data officer Brett Goldstein's laptop. Goldstein started with a single MongoDB node, and iterated from there, turning it into one of the most exciting data-driven applications in the industry today. Given that we often don't know exactly which data to query, or how to query, or how to put data to work in our applications, this is precisely how a Big Data project should work. Start small, then iterate toward something big. This kind of tinkering simply is difficult to impossible with a relational database, as The Economist's Kenneth Cukier points out in his book, Big Data: A Revolution That Will Transform How We Live, Work, and Think : Conventional, so-called relational, databases are designed for a world in which data is sparse, and thus can be and will be curated carefully. It is a world in which the questions one wants to answer using the data have to be clear at the outset, so that the database is designed to answer them - and only them - efficiently. But with a flexible document database like MongoDB, it suddenly becomes much easier to iterate toward Big Data insights. We don't need to go out and hire data scientists. Rather, we simply need to apply existing, open-source technology like MongoDB to our Big Data problems, which jibes perfectly with Gartner analyst Svetlana Sicular's mantra that it's easier to train existing employees on Big Data technologies than it is to train data scientists on one's business. Except, in the case of MongoDB, odds are that enterprises are already filled with people that understand MongoDB, as 451 Research's LinkedIn analysis suggests: In sum, Big Data needn't be daunting or difficult. It's a download away.
The 'middle class' of Big Data
So much is written about Big Data that we tend to overlook a simple fact: most data isn’t big at all. As Bruno Aziza writes in Forbes , “it isn’t so” that “you have to be Big to be in the Big Data game,” echoing a similar sentiment from ReadWrite ’s Brian Proffitt . Large enterprise adoption of Big Data technologies may steal the headlines, but it’s the “middle class” of enterprise data where the vast majority of data, and money, is. There’s a lot of talk about zettabytes and petabytes of data, but as EMA Research highlights in a new study, “Big Data’s sweet spot starts at 110GB and the most common customer data situation is between 10 to 30TB.” Small? Not exactly But Big? No, not really. Couple this with the fact that most businesses fall into the 20-500-employee range , as Intuit CEO Brad Smith points out , and it’s clear that the biggest market opportunity for Big Data is within the big pool of relatively small enterprises with relatively small data sets. Call it the vast middle class of enterprise Big Data. Call it whatever you want. But it’s where most enterprise data sits. The trick is to first gather that data, and then to put it to work. A new breed of “data-science-as-a-service” companies like Metamarkets and Infochimps has arisen to lower the bar to culling insights from one’s data. While these tools can be used by enterprises of any size, I suspect they’ll be particularly appetizing to small-to-medium sized enterprises, those that don’t have the budget or inclination to hire a data science. (This might be the right way to go, anyway, as Gartner highlights : “Organizations already have people who know their own data better than mystical data scientists.” What they really need is access to the data and tools to process it.) Intriguingly, here at 10gen we’ve seen a wide range of companies, large and small, adopt MongoDB as they build out data-centric applications, but not always with Big Data in mind. In fact, while MongoDB and Hadoop are top-of-mind for data scientists and other IT professionals, as Wikibon has illustrated , many of 10gen’s smaller customers and users aren’t thinking about Big Data at all. Such users are looking for an easy-to-use, highly flexible data store for their applications. The fact that MongoDB also has their scalability needs covered is a bonus, one that many will unlock later into their deployment when they discover they’ve been storing data that could be put to use. In the RDBMS world, scale is a burden, both in terms of cost (bigger scale = bigger hardware = bigger license fees). Today, with NoSQL, scale is a given, allowing NoSQL vendors like 10gen to accentuate scalability with other benefits. It’s a remarkable turn of events for technology that emerged from the needs of the web giants to manage distributed systems at scale. We’re all the beneficiaries. Including SMBs. We don’t normally think about small-to-medium-sized businesses when we think of Big Data, but we should. SMBs are the workhorse of the world’s economies, and they’re quietly, collectively storing massive quantities of data. The race is on to help these companies put their comparatively small quantities of data to big use. It’s a race that NoSQL technologies like MongoDB are very well-positioned to win. Tagged with: MongoDB, big data, SMB, Hadoop, rdbms, Infochimps, Metamarkets, Gartner, Wikibon, data scientist
Top Big Data skills? MongoDB and Hadoop
According to new research from the UK’s Sector Skills Council for Business and Information Technology, the organization responsible for managing IT standards and qualifications, Big Data is a big deal in the UK, and MongoDB is one of the top Big Data skills in demand. This meshes with SiliconAngle Wikibon research I highlighted earlier, detailing Hadoop and MongoDB as the top-two Big Data technologies. It also jibes with JasperSoft data that shows MongoDB as one of its top Big Data connectors: MongoDB is a fantastic operational data store. As soon as one remembers that Big Data is a question of both storage and processing, it makes sense that the top operational data store would be MongoDB, given its flexibility and scalability. Foursquare is a great example of a customer using MongoDB in this way. On the data processing side, a growing number of enterprises use MongoDB both to store and process log data, among other data analytics workloads. Some use MongoDB with its built-in MapReduce functionality, while others choose to use the Hadoop connector or MongoDB’s Aggregation Framework to avoid MapReduce. Whatever the method or use case, the great thing about Big Data technologies like MongoDB and Hadoop is that they’re open source, so the barriers to download, learn, and adopt them are negligible. Given the huge demand for Big Data skills, both in the UK and globally, according to data from Dice and Indeed.com , it’s time to download MongoDB and get started on your next Big Data project. Tagged with: MongoDB, Hadoop, Big Data, open source, operational database, Foursquare, IT jobs, jobs
Chicago looks to cut crime with MongoDB
The true value of a technology isn’t how rich it makes a vendor, but rather how productive and happy it makes a user. Of course it’s cool for 10gen to be selected from a pool of 5,900 U.S. startups and ranked as the top U.S. software startup , if for no other reason than to earn me bragging rights with my mom. But while that may be interesting for 10gen employees and investors (and my mom), it’s nowhere near as cool as having the City of Chicago, for example, build an exceptionally innovative crime prevention tool using MongoDB, as The Wall Street Journal recently reported . The City of Chicago simply could not create its analytics platform with a relational database. Not easily, anyway. The City needed the ability to marry structured and unstructured data, allowing City employees to combine disparate sources and types of data in order to glean insights. What does this mean? As but one example: [C]ity officials might look at a high crime area, while also mapping out the number of liquor permits for a neighborhood, along with the amount of nearby abandoned buildings. Using transcripts from resident complaints or 911 calls [or data from any number of 30 different City agencies or departments], officials could also see trending concerns for the area, like broken lights or stolen garbage cans, and the times the incidents are occurring. If the high crime area also has a high number of liquor permits, for example, officials could then see if other neighborhoods also faced both issues, allowing them to create a more effective response for those areas. I read that and feel positively giddy that modern technology makes this sort of thing possible. It’s even more exciting when you consider that the City of Chicago didn’t have to engage in protracted negotiations to use the technology. The City simply downloaded MongoDB and got started. This is great for 10gen. My mom has never been prouder of me. But it’s so much more important for the City of Chicago and other users looking to leverage NoSQL technology like MongoDB to solve Big Data and other problems. As Cowen & Co. analyst Peter Goldmacher recently wrote, We believe the biggest winners in the Big Data world aren't the Big Data technology vendors, but rather the companies that will leverage Big Data technology to create entirely new businesses or disrupt legacy businesses. Cloudera CEO (and my good friend) Mike Olson followed up on this report by concluding, “You want to get rich on Big Data? Use it!” He’s absolutely right. The real riches in open source, Big Data, mobile, etc. will not go to the vendors who develop, sell, and support these technologies. Sure, some will do well from these activities, but that’s not really the point. No, the real riches go to those who embrace and implement these technologies, whether Hadoop, Linux, MongoDB, or Storm. Which is, when you think about it, exactly as it should be. [Posted by Matt Asay, vice president of Corporate Strategy] Tagged with: City of Chicago, MongoDB, big data, analytics, hadoop, Cowen & Co., open source, nosql, Wall Street Journal
MongoDB at the heart of Big Data
Open source is the heart of Big Data, driving the state of the art in both data storage and data processing. As recent research suggests , when IT professionals and data scientists get serious about building Big Data applications, they overwhelmingly turn to MongoDB and Hadoop. MongoDB is increasingly the industry’s default data store for Big Data-type applications, and it also works for data processing. Hadoop, for its part, is deployed for deep, computationally-intensive data processing. The two technologies are highly complementary. Small wonder, then, that many companies use the two together. To meet this demand, 10gen built a MongoDB Hadoop Adapter , which Orbitz, Foursquare, and others deploy to store and crunch massive quantities of data. There are also integrations with Storm and other tools for real-time, but MongoDB-plus-Hadoop is easily the most widely used integration. Where does this leave traditional RDBMS solutions? As 10gen president Max Schireson explains in a recent interview, the cost of building out a Big Data solution with, for example, Oracle technology is cost-prohibitive: In the relational world, when you need real processing power you might go out and buy a big [Oracle] Exadata box for $10m. But in our world the way to get more power is just to buy more cheap commodity servers. One $10m server will typically have less processing power than a rack full of 50 cheap commodity servers that cost $5,000 each or $250,000 in total. And while open-source SQL technologies like MySQL aren’t likely to break an IT budget, they also aren’t well-suited for large quantities of unstructured, complex data. Not everyone will need to turn to Hadoop to process the data stored in MongoDB. As the City of Chicago has done for its Data Portal , MongoDB can be a highly efficient way to both store and process data in real-time. Schireson explains: If you're storing data in a relational database and you want to run it through Hadoop, you need to take the data out of the database, put it into HDFS [Hadoop File System], do the analytics in Hadoop, take the result of that and put it back into the database. With Mongo you can do those operations in real time while it's still in the operational database. You can also mix and match database-style queries with Hadoop-style MapReduce analytics. If you’re interested in hearing how enterprises are embracing MongoDB for Big Data analysis, using it alone or in concert with Hadoop, 10gen’s chief evangelist Steve Francia, along with select MongoDB enterprise users, will be presenting on this topic at Strata Conference’s Bridge to Big Data track on October 23 in New York. We’d love to see you there. Tagged with: Hadoop, big data, MongoDB, Max Schireson, MongoDB Hadoop Adapter
Alteryx and 10gen: Using MongoDB to Humanize Big Data
Alteryx , a provider of strategic analytics, announced deeper integration with MongoDB in its Strategic Analytics 7.1 platform to provide new predictive analytics tools and deliver deeper customer insight with the packaging of new business critical data. Alteryx Strategic Analytics 7.1 features new native integration with MongoDB along with more robust Hadoop integration, offering the ability to compare data sets from varies sources such as spreadsheets, data warehouses, cloud data sources and packaged and syndicated market data. ...Alteryx is Humanizing Big Data by allowing customers to integrate any type of data, including unique packaged data, and then putting the power of predictive analytics in the hands of the people who drive decisions in organizations,â€œ said George Mathew, president and chief operating officer, Alteryx. ...Humanizing Big Data is the next evolution of Strategic Analytics, and Alteryx will ensure customers excel here.â€œ ...We are excited to have a deeper level of integration between MongoDB and the Alteryx platform to drive better Big Data analytics and more value for our joint customers,â€œ said Max Schireson, president, 10gen/Mongo DB. â€œFrom a business standpoint, this connector with Alteryx will allow easier access for the data analyst to bring even more advanced capabilities to the cloud, and enable a unified view of what data is readily available to make the best analytic decision possible.â€œ For more information on using Alteryx Strategic Analytics read the full announcement on their blog . Tagged with: data, big data, analytics, platforms, data driven, hadoop, mongodb big data, MongoDB, Mongo, NoSQL, Polyglot persistence, 10gen