Top Big Data skills? MongoDB and Hadoop

According to new research from the UK’s Sector Skills Council for Business and Information Technology, the organization responsible for managing IT standards and qualifications, Big Data is a big deal in the UK, and MongoDB is one of the top Big Data skills in demand. This meshes with SiliconAngle Wikibon research I highlighted earlier, detailing Hadoop and MongoDB as the top-two Big Data technologies. It also jibes with JasperSoft data that shows MongoDB as one of its top Big Data connectors: MongoDB is a fantastic operational data store. As soon as one remembers that Big Data is a question of both storage and processing, it makes sense that the top operational data store would be MongoDB, given its flexibility and scalability. Foursquare is a great example of a customer using MongoDB in this way. On the data processing side, a growing number of enterprises use MongoDB both to store and process log data, among other data analytics workloads. Some use MongoDB with its built-in MapReduce functionality, while others choose to use the Hadoop connector or MongoDB’s Aggregation Framework to avoid MapReduce. Whatever the method or use case, the great thing about Big Data technologies like MongoDB and Hadoop is that they’re open source, so the barriers to download, learn, and adopt them are negligible. Given the huge demand for Big Data skills, both in the UK and globally, according to data from Dice and Indeed.com , it’s time to download MongoDB and get started on your next Big Data project. Tagged with: MongoDB, Hadoop, Big Data, open source, operational database, Foursquare, IT jobs, jobs

January 8, 2013

Yottaa and Pingdom run MongoDB for real-time analytics

There are many reasons to use a NoSQL database like MongoDB, but Pierre DeBois hones in one that doesn’t always get the attention it deserves: analytics. As DeBois, founder of Zimana, writes, “NoSQL databases are gaining in popularity because they offer the scalability required for real-time processing of complex datasets.” As I’ve noted before , “Big Data” is sometimes the reason enterprises look to NoSQL, but sometimes they unnecessarily focus on MongoDB or another NoSQL database as an operational data store, and neglect its utility for analytics, too. But as InformationWeek recently highlighted , identifying 10gen as one of the top Big Data vendors to watch in 2013, new functionality in MongoDB - specifically, the aggregation framework - make it a useful tool for a variety of analytics workloads: [MongoDB’s] data aggregation framework fills an analytics void by letting users directly query data within MongoDB without using complicated batch-oriented MapReduce jobs. This isn’t to suggest that MongoDB is a like-for-like replacement of map-reduce. It’s not. But for averaging field values or calculating totals, it’s super-fast and convenient. And for the page load optimization work that Yottaa and Pingdom have engineered, MongoDB is an excellent fit, as DeBois describes . I’d love to hear examples of other companies using MongoDb for real-time analytics, and what your experience has been. - Posted by Matt Asay, vice president of Corporate Strategy Tagged with: Yottaa, Pingdom, real-time analytics, page load optimization, nosql, MongoDB, InformationWeek, analytics, operational database, case study

December 13, 2012

Living in the post-transactional database future

Given that we’ve spent decades building applications around relational databases, it’s not surprising that the first response to the introduction of NoSQL databases like MongoDB is sometimes “Why?” Developers aren’t usually the ones asking this question, because they love the approachability and flexibility MongoDB gives them. But DBAs who have built their careers on managing heavy RDBMS infrastructure? They’re harder to please. 10gen president Max Schireson estimates that 60 percent of the world’s databases are operational in nature, which is MongoDB’s market. Of those use cases, most of them are ripe for a non-relational approach. The database market is rapidly changing, and very much up for grabs. Or as Redmonk analyst James Governor puts it , “The idea that everything is relational? Those days are gone.” As useful as relational databases are (and they’re very useful for a certain class of application), they are losing relevance in a world where complex transactions are more the exception, less the rule. In fact, I’d argue that over time, the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology, not legacy RDBMS. That’s the future. What about now? Arguably, many of the applications being built today are already post-transaction, ripe for MongoDB and poor fits for RDBMS. Consider: Amazon: its systems that process order transactions (RDBMS) are largely “done” and “stable”. Amazon’s current development is largely focusing on how to provide better search and recommendations or how to adapt prices on the fly (NoSQL). Netflix: the vast majority of it engineering is focusing on recommending better movies to you (NoSQL), not processing your monthly bill (RDBMS). Square: the easy part is processing the credit card (RDBMS). The hard part is making it location aware, so it knows where you are and what you’re buying (NoSQL). It’s easy, but erroneous, to pigeon-hole these examples as representative of an anomalous minority of enterprises. Yes, these companies represent the cutting edge of both business and technology. But no, they are not alone in building these sorts of applications. For every early-adopter Netflix there’s a sizable, growing population of mainstream companies in media (e.g., The Guardian ), finance (e.g., Intuit ), or other verticals that are looking to turn technology into a revenue-driving asset, and not simply something that helps keep the lights on and payrolls running. When what we built were websites, RDBMS worked great. But today, we’re building applications that are mobile, social, involve high volume data feeds, incorporate predictive analytics, etc. These modern applications? They don’t fit RDBMS. Andy Oliver lists 10 things never to do with a relational database , but the list is much longer, and growing. MongoDB is empowering the next generation of applications: post-transactional applications that rely on bigger data sets that move much faster than an RDBMS can handle. Yes, there will remain a relatively small sphere of applications unsuitable for MongoDB (including applications with a heavy emphasis on complex transactions), but the big needs going forward like search, log analysis, media repositories, recommendation engines, high-frequency trading, etc.? Those functions that really help a company innovate and grow revenue? They’re best done with MongoDB. Of course, given RDBMS’ multi-decade legacy, it’s natural for developers to try to force RDBMS to work for a given business problem. Take log analysis, for example. Oliver writes: Log analysis : …[T]urn on the log analysis features of Hadoop or RHQ/JBossON for a small cluster of servers. Set the log level and log capture to anything other than ERROR. Do something more complex and life will be very bad. See, this kind of somewhat unstructured data analysis is exactly what MapReduce à la Hadoop and languages like PIG are for. It’s unfortunate that the major monitoring tools are RDBMS-specific — they really don’t need transactions, and low latency is job No. 1. For forward-looking organizations, they already realize that MongoDB is an excellent fit for log management, which is why we see more and more enterprises turning to MongoDB for this purpose. I expect this to continue. As MongoDB continues to enrich its functionality , the universe of applications for which it is not merely applicable, but also better , will continue to expand, even as the universe of applications for which RDBMS is optimal will decline. Indeed, we’re already living in a post-transactional world. Some people just don’t know it yet. (Or, as William Gibson would say, “The future is already here – it’s just not very evenly distributed.”) Posted by Matt Asay , vice president of Corporate Strategy, with significant help from my inestimable colleague, Jared Rosoff . Tagged with: NoSQL, MongoDB, RDBMS, relational, James Governor, Redmonk, log analysis, Andy Oliver, transactions, Netflix, Amazon, Square, operational database, DBA

November 20, 2012