The 'middle class' of Big Data

So much is written about Big Data that we tend to overlook a simple fact: most data isn’t big at all. As Bruno Aziza writes in Forbes, “it isn’t so” that “you have to be Big to be in the Big Data game,” echoing a similar sentiment from ReadWrite’s Brian Proffitt. Large enterprise adoption of Big Data technologies may steal the headlines, but it’s the “middle class” of enterprise data where the vast majority of data, and money, is.

There’s a lot of talk about zettabytes and petabytes of data, but as EMA Research highlights in a new study, “Big Data’s sweet spot starts at 110GB and the most common customer data situation is between 10 to 30TB.”

Small? Not exactly But Big? No, not really.

Couple this with the fact that most businesses fall into the 20-500-employee range, as Intuit CEO Brad Smith points out, and it’s clear that the biggest market opportunity for Big Data is within the big pool of relatively small enterprises with relatively small data sets. Call it the vast middle class of enterprise Big Data. Call it whatever you want. But it’s where most enterprise data sits.

The trick is to first gather that data, and then to put it to work.

A new breed of “data-science-as-a-service” companies like Metamarkets and Infochimps has arisen to lower the bar to culling insights from one’s data. While these tools can be used by enterprises of any size, I suspect they’ll be particularly appetizing to small-to-medium sized enterprises, those that don’t have the budget or inclination to hire a data science. (This might be the right way to go, anyway, as Gartner highlights: “Organizations already have people who know their own data better than mystical data scientists.” What they really need is access to the data and tools to process it.)

Intriguingly, here at 10gen we’ve seen a wide range of companies, large and small, adopt MongoDB as they build out data-centric applications, but not always with Big Data in mind. In fact, while MongoDB and Hadoop are top-of-mind for data scientists and other IT professionals, as Wikibon has illustrated, many of 10gen’s smaller customers and users aren’t thinking about Big Data at all.

Such users are looking for an easy-to-use, highly flexible data store for their applications. The fact that MongoDB also has their scalability needs covered is a bonus, one that many will unlock later into their deployment when they discover they’ve been storing data that could be put to use. In the RDBMS world, scale is a burden, both in terms of cost (bigger scale = bigger hardware = bigger license fees). Today, with NoSQL, scale is a given, allowing NoSQL vendors like 10gen to accentuate scalability with other benefits. It’s a remarkable turn of events for technology that emerged from the needs of the web giants to manage distributed systems at scale. We’re all the beneficiaries.

Including SMBs. We don’t normally think about small-to-medium-sized businesses when we think of Big Data, but we should. SMBs are the workhorse of the world’s economies, and they’re quietly, collectively storing massive quantities of data. The race is on to help these companies put their comparatively small quantities of data to big use. It’s a race that NoSQL technologies like MongoDB are very well-positioned to win.

Tagged with: MongoDB, big data, SMB, Hadoop, rdbms, Infochimps, Metamarkets, Gartner, Wikibon, data scientist