Big Data

16 results

Mike Olson On The Past And Future Of Data

Data today is very big, but it's not because any particular individual or company is creating lots and lots of data. Instead, we live in a new machine age, with a vast proliferation of machines emitting data in volumes and variety the world has never seen. As such, no single company will be big enough to tackle Big Data alone, declared Cloudera co-founder and chief strategy officer Mike Olson in his MongoDB World keynote this week in New York City. There Is No Big Bang Big Data isn't about Big Companies or other single sources of data. U.S. homes now hold over 500 million Internet-connected devices at an average of 5.7 per household, according to NPD. By 2017 each person will have 5 Internet-connected devices, with each one contributing to a torrent of data. In the past, Olson indicated, we built big, centralized databases, which were good at a managing data created at human scale. They were awesome for their generation. But they’re simply not good enough for the world of machine-generated data, i.e., the world we live in now. These relational databases were designed for a world that didn’t need to account for the incredible variety and petabyte-scale of machine-generated data. Google introduced us into a new, small flexible incremental architecture, which gave us a new way to think about hardware and software and, really, a new way to think about data. Google also gave us a new way of thinking about how to capture, store and analyzing data. That "new way" is the cloud. As he stressed, data will tend to remain where it was generated. Given that the vast majority of new data is created in or for the cloud, modern databases must also live in the cloud. One Database To Rule Them All? While it's tempting to think history will repeat itself and one company will dominate Big Data, such a zero sum, one winner-takes-all outcome is unlikely. The reason? The power of a data hub or platform derives from its ability to collect data from small, disparate systems, loosely coupled, rather than from owning the Big Data "stack." Hence, while Olson at one time thought Cloudera, MongoDB and Teradata would fiercely compete to manage the same data, the reality is that the three companies now work closely together to take care of data at all points in the data lifecycle. Big Data is not zero sum. It's not created by any single entity, and it can't be controlled by any single entity. A community is required. As both Olson and MongoDB CEO Max Schireson insisted in their respective keynotes, that community is comprised of Cloudera and MongoDB working together to solve customers' biggest Big Data problems. To see all MongoDB World presentations, visit the [MongoDB World Presentations](https://www.mongodb.com/mongodb-world/presentations) page.

June 25, 2014

MongoDB And Teradata Join Forces To Make Big Data Smart

As enterprises increasingly depend on MongoDB to build and run modern applications, they need high-quality analytics solutions to match MongoDB's powerful data model. With the partnership Teradata and MongoDB just announced , they just got one. And it's exceptionally cool. With data analytics leader Teradata we've built a bi-directional connector that gives organizations interactive data processing at extremely fast speeds. Teradata's bi-directional QueryGrid connector allows Teradata customers to integrate massive volumes of JSON with cross-organizational data in the data warehouse for high performance analytics. Through the connector, MongoDB customers will have access to JSON that has been enriched by Teradata to support rapidly evolving applications for mobile, Internet of Things, eCommerce, social media and other applications. In other words, users will soon be able to easily connect MongoDB applications and analytics running on Teradata. The Future Is JSON For the past 40 years, enterprises have stored their data in the tidy-but-rigid tables and joins of relational databases. Given the explosion of unstructured data, however, enterprises need a more expressive, flexible way of describing and storing data. Enter JSON. MongoDB stores data in JSON documents, which we serialize to BSON . JSON provides a rich data model that seamlessly maps to native programming language types, and the dynamic schema makes it easier to evolve one's data model than with a system that enforces schemas like a relational database (RDBMS). Marrying MongoDB's operational database with Teradata's analytics platform a great way to bring together all of an enterprise's data. A Virtuous Cycle One way of thinking about the interaction between MongoDB and Teradata is to picture a crowd of people. MongoDB interacts with individuals within the crowd in real-time while Teradata looks for patterns within the crowd. With this connector, organizations can push their MongoDB data (website clicks, purchases, etc.) into Teradata, which runs queries against the data, looking for patterns. This intelligence is then pushed back to MongoDB, enriching the interaction with individual eCommerce buyers, mobile users, etc. It's a virtuous cycle, as Teradata describes on its blog . Here's what this looks like for an eCommerce application: By bringing the two together, an eCommerce vendor's interactions with its customers will continuously improve as their MongoDB-based application gets smarter and more tailored by Teradata analytics. Importantly, for enterprises that expect to use both relational databases and MongoDB, Teradata's JSON integration unifies relational and MongoDB data analysis. And, Not Or This last point is worth repeating. As much as enterprises might wish to shed their IT investments and start over, the reality is that they can't and won't, as a 2012 Gartner analysis found: By giving organizations an easy way to connect MongoDB's operational data with Teradata's enterprise data warehouse, the two organizations ensure existing and new data sources can coexist. By working closely together, MongoDB and Teradata give enterprises the best of a modern, operational database with a powerful analytics platform.

June 18, 2014

You Know What's Cool? 1 Trillion Is Cool

A million used to be cool. Then Facebook upped the ante to one billion. But in our world of Big Data, even a billion is no longer the upper end of scale, or cool. As I learned last night, at least one MongoDB customer now stores over 1 trillion documents in MongoDB. 1 trillion . That's cool. It's also far bigger than any other database deployment I've seen from any NoSQL or relational database, even from the simple key-value or columnar data stores that are only programmed to handle simple workloads, but to scale them well. That's what makes MongoDB über cool: not only does it offer dramatic, superior scale , but it does so while also giving organizations the ability to build complex applications. MongoDB delivers the optimal balance between functionality and performance, as this illustrates: Many systems are focused on nothing more than storing your data, and letting you access it quickly, but one and only one way. This simply isn’t enough . A truly modern database must support rich queries, indexing, analysis, aggregation, geospatial access and search across multi-structured, rapidly changing data sets in real time. The database must not trap your data and hinder its use. It must unleash your data . All 1 trillion documents of it. Want to see how major Global 2000 organizations like Bosch, U.S. Department of Veterans Affairs, Genentech, Facebook and many others scale with MongoDB? Easy. Just register to attend MongoDB World, June 24-25 in New York City. You can use my discount code to get 25% off: 25MattAsay.

May 16, 2014

Big Data Is The New Normal

Now Big Data has even won a Gartner Seal of Approval, so to speak, with the publication of a new report that says Big Data is on the fast track to maturity and by 2016 it will just be data. The huge idea here is that as information goes cross-platform and vaults up in volume, very shortly Big Data will become another norm of how business gets done. By no means will all data be Big Data - that would make no sense because there of course are key values found in certain, specific data warehouses that are quite small and structured. But there also is the fact that, suddenly, business realizes it is awash in data and it knows that if only it can harness the insights that can be derived from the information already on hand or soon to be, great value will ensue. Big data transforming bricks + mortar retailing Think about the recent New York Times story about how pioneering retailers are tracking customer behavior by monitoring cellphone signals -- WiFi in particular. The story triggered significant teeth gnashing by privacy proponents, and these concerns are understandable. But a reality is that e-commerce players already have enormous tracking data on their customers and it stands to reason that, finally, bricks and mortar retailers would want to level the playing field. Big data plus smartphones is an equation that works. Put the privacy debate aside. Think simply about the information flow, its volume and the insights it would give retailers. That is very big data indeed and the goal would be mashing it up and then reinventing store layout, product placement, and in effect making it easier for consumers to find and buy what they came into the store for. Really knowing the customer Probably a gold standard for pursuing BIg Data is Netflix, which is well known for gathering information about what their customers watch but also how they watch it - where do they fast forward, where do they rewind, when do they simply turn off a film and never return? But then Netflix goes farther with its data, per reporting in SiliconANGLE : “[Netflix] actually [is] putting that data to use. Netflix has begun to produce its own original TV shows, and to do so its leveraging all of its data to do it. Netflix used its data to decide that the BBC’s ‘House of Cards’ was the best fit for a remake, and its data also correlated fans of the original to fans of actor Kevin Spacey and director David Fincher, which in turn was what led to them being hired.” Think about the power there: Big data is driving complex decisions and, apparently, it is helping get closer to what consumers really want. The maturation of Big Data A safe bet is that such stories - revolutionary as they sound in 2013 - will seem commonplace within a very few years because, right now, the ingredients are all coming together for a flowering of Big Data into an everyday business tool and probably 2016, as Gartner predicts, is as good a guess as any. Certainly it will become more commonplace in many more businesses very soon, about now in fact as the first generation pioneers - with their massive data stores and new analytical tools for rapidly making sense out of them -- start to enjoy differentially superior results. They are demonstrating that Big Data works, period. It’s no longer a computer science project, it’s becoming just plain business. The irony is that when Big Data becomes humdrum - when it loses its buzzword status - that is when it genuinely will have solidified a role as a transformational information utility. By, say, 2020 we will look back and be puzzled at how organizations arrived at their marketing and design decisions without Big Data. It will seem every bit as puzzling as, say, how organizations maintained customer data befor CRM (can you say 3” x 5” card?). From where we sit in 2013, that future seems distant indeed. But it also is closer than we think.

August 5, 2013

The Big Data Hoax That Wasn't

Welcome to the Age of Big Data. Or perhaps it’s the Age of Big Data Agnosticism. In a Newtonian twist, what started as a wave of hype for data’s transformational potential on organizations everywhere has turned into an equal and opposite backlash of big data naysaying. It is an understandable reaction to the great over-selling of big data as a kind of enterprise cure-all. Of course, in some companies, big data pilots have produced nothing but big piles of unfulfilled expectations. But the problem likely is not big data. Big data remains potentially the most powerful engine for business transformation to gain currency in the 21st century. The problem is that so much of what is sold as big data isn’t. It’s typically just lots of data. “Big data, that’s just data mining with a fancy new name.” How often have you heard that? It’s flatly false. The size or volume of the data does not matter in genuine big data analytics. Instead, savvy organizations already understand that big data is really about working with a mix of data types - structured and unstructured, from inside the organization and outside. It is CRM forms, but it also is Tweets, Facebook posts, TripAdvisor rants, Gmails, Outlook entries, even voicemail. In most organizations this does not add up to petabytes of data, as I’ve written before . Terabytes is the usual quantity even though that seems small by many measures. The complexity arises in the diversity of data. And that raises a problem. Not many databases have the flexibility to handle that many forms of data. And fewer databases have the agility to permit modifications on the fly - “Shouldn’t we add SMS data in here, too?” The right answer is, done. A database that cannot - with little fuss -- add a new row is too rigid for use in true big data analysis because the exciting - maybe maddening? - bit about big data today is that always there is new input that may enhance the overall result. Then there are the other questions: why are you collecting big data in the first place? What do you want from your analysis of it and this question is key because without targeted analytics, big data is just hoarding. As an insightful story in The Guardian recently posited, “Companies need to focus on big answers not big data. Instead of focusing upon the concept of big data, organizations should concentrate on the intelligence data can offer.” In other words, it’s not about the data: it’s about what intelligence can be drawn from it. The Guardian author calls himself a “big data sceptic” but, really, he isn’t. He just shares the frustration over the many mislabeled big data projects - that never were about big data - and also about the data hoarding that some companies do when they say they are committing to big data. Such projects rarely end well. Real big data - unstructured, from multiple sources - coupled with real analytics is a game changer that gives forward-thinking organizations insight that before was merely guesswork. One Texas city ran analyses to determine exactly what happened in parts of the city that experienced higher than anticipated growth and a resulting increase in value. This was true big data. In the mix were police reports, zoning violations, construction permits, parking tickets, you name it. If the data existed, it was fed into the analysis and the city began to see what it did - and didn’t do - to spur growth. Where could it get out of the way? Where could it proactively spur growth? It was real big data in action. And it’s why big data remains a big deal, despite the hype.

June 28, 2013

Big Data Is A Matter Of Speed, Not Size

Finally the market is getting over its initial BIG Data fixation. Unfortunately, in the process we may be inclined to throw away the Big Data signal in an attempt to rid ourselves of all the noise. The Guardian 's John Burn-Murdoch highlights this today, asserting that "'small data' - or data of the volumes most regular analysts, researchers and statisticians are used to dealing with - is actually both more relevant and more useful to the vast majority of organisations than its big cousin." He concludes, "[I]t is speed, not size that is increasingly driving desire for software and hardware improvements at data-processing organisations." While we talk about Big Data, the reality is that there is a much more important trend going on in data, generally, as Rufus Pollock, Founder and Co-Director of the Open Knowledge Foundation, captures : [W]e risk overlooking the much more important story here, the real revolution, which is the mass democratisation of the means of access, storage and processing of data. This story isn't about large organisations running parallel software on tens of thousand of servers, but about more people than ever being able to collaborate effectively around a distributed ecosystem of information, an ecosystem of small data. Now if only we could get everyone else to recognize this essential truth, so we could stop admiring how very big all our data is, and instead focus on actually putting it to work in time for it to be useful to us.

May 20, 2013

Data Scientist Shortage? There's An App For That

Big Data is all the rage, but apparently will come to a crashing halt due to a shortage of data scientists. As I've argued elsewhere , this is mostly a sham. Context is critical for making use of a company's data, and the people with context already work for the enterprise. So it becomes a matter of training the people one has, rather than going off on a scouting trip for the mythical data scientist. Nor will the "science" of Big Data remain such for long, according to IBM's James Kobielus . As he notes, "core data scientist aptitudes -- curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature -- are widely distributed throughout workforces everywhere." He then points to a few key trends that will make data science less of a science: As more data discovery, acquisition, preparation, and modeling functions are automated through better tools, today's data scientists will have more time for the core of their jobs: statistical analysis, modeling, and interaction exploration. Data scientists are developing fewer models from scratch. That's because more and more big data projects run on application-embedded analytic models integrated into commercial solutions.... Open source communities and tools will greatly expand the pool of knowledgeable, empowered data scientists at your disposal, either as employees or partners. This jibes with Cloudera CEO Mike Olson's contention that "There will be enormous Hadoop adoption, but you'll get it by virtue of the applications you run." But whether an organization interprets its data through applications or directly using open-source technologies, one thing that remains true in all this: people are critical to making sense of Big Data. The data won't speak for itself. It's therefore critical to find people inside one's organization who can help make sense of the organization's data. The good news? They're already available and on the payroll.

May 17, 2013

Why Open Source Is Essential To Big Data

Gartner analyst Merv Adrian recently highlighted some of the recent movements in Hadoop Land, with several companies introducing products "intended to improve Hadoop speed." This seems odd, as that wouldn't be my top pick for how to improve Hadoop or, really, most of the Big Data technologies out there. By many accounts, the biggest need in Hadoop is improved ease of use, not improved performance, something Adrian himself confirms : Hadoop already delivers exceptional performance on commodity hardware, compared to its stodgy proprietary competition. Where it's still lacking is in ease of use. Not that Hadoop is alone in this. As Mare Lucas asserts , Today, despite the information deluge, enterprise decision makers are often unable to access the data in a useful way. The tools are designed for those who speak the language of algorithms and statistical analysis. It’s simply too hard for the everyday user to “ask” the data any questions – from the routine to the insightful. The end result? The speed of big data moves at a slower pace … and the power is locked in the hands of the few. Lucas goes on to argue that the solution to the data scientist shortage is to take the science out of data science; that is, consumerize Big Data technology such that non-PhD-wielding business people can query their data and get back meaningful results. The Value Of Open Source To Deciphering Big Data Perhaps. But there's actually an intermediate step before we reach the Promised Land of full consumerization of Big Data. It's called open source. Even with technology like Hadoop that is open source yet still too complex, the benefits of using Hadoop far outweigh the costs (financial and productivity-wise) associated with licensing an expensive data warehousing or analytics platform. As Alex Popescu writes , Hadoop "allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility." But these benefits aren't unique to Hadoop. They're inherent in any open-source project. Now imagine we could get open-source software that fits our Big Data needs and is exceptionally easy to use plus is almost certainly already being used within our enterprises...? That is the promise of MongoDB, consistently cited as one of the industry's top-two Big Data technologies . MongoDB makes it easy to get started with a Big Data project. Using MongoDB To Innovate Consider the City of Chicago. The Economist wrote recently about the City of Chicago's predictive analytics platform, WindyGrid. What The Economist didn't mention is that WindyGrid started as a pet project on chief data officer Brett Goldstein's laptop. Goldstein started with a single MongoDB node, and iterated from there, turning it into one of the most exciting data-driven applications in the industry today. Given that we often don't know exactly which data to query, or how to query, or how to put data to work in our applications, this is precisely how a Big Data project should work. Start small, then iterate toward something big. This kind of tinkering simply is difficult to impossible with a relational database, as The Economist's Kenneth Cukier points out in his book, Big Data: A Revolution That Will Transform How We Live, Work, and Think : Conventional, so-called relational, databases are designed for a world in which data is sparse, and thus can be and will be curated carefully. It is a world in which the questions one wants to answer using the data have to be clear at the outset, so that the database is designed to answer them - and only them - efficiently. But with a flexible document database like MongoDB, it suddenly becomes much easier to iterate toward Big Data insights. We don't need to go out and hire data scientists. Rather, we simply need to apply existing, open-source technology like MongoDB to our Big Data problems, which jibes perfectly with Gartner analyst Svetlana Sicular's mantra that it's easier to train existing employees on Big Data technologies than it is to train data scientists on one's business. Except, in the case of MongoDB, odds are that enterprises are already filled with people that understand MongoDB, as 451 Research's LinkedIn analysis suggests: In sum, Big Data needn't be daunting or difficult. It's a download away.

May 2, 2013

Guest post: Developing MongoDB-based archival solutions for your long-term storage strategy About CIGNEX Datamatics

This is a guest post by Yash Badiani, Practice Head - Big Data, CIGNEX Datamatics. Record keeping and document archiving are such common practices within enterprises that their importance often goes unrecognized. An efficient archivist was the person who preserved records with such a systematic finesse and structured pattern that archives filed decades ago could be retrieved in matter of minutes. But when enterprise transactions took an innovative leap through computers playing an important role in operations, the volume of data to be managed by an archivist went beyond their scope. The digital data explosion has paved the way for content archival applications that can seamlessly manage operational data. The Evolution of Data Archival Solutions As a first step towards data archival solutions, the enterprise used turnkey applications to archive emails, legal documents, invoices, and important documents leveraging the raw disk capacity and the security footprint of the applications. But, the world of data archiving was set for another innovative leap, as the requirements of data to be archived bulged not only in VOLUME, but also VARIETY as enterprises identified that their data archiving policy aren't limited to storing emails and invoices but also include log management, enterprise videos, audios, and images on the web, social media feeds, audit trails, data from online transactions etc. In addition, there was immense metadata associated with all the content which weren't completely leveraged leading to difficulties in content retrieval. Suddenly, the resident enterprise application was under scrutiny much like our pedantic data archivist as the data infused was out of the permissible boundary of their limitations. But the challenge was not just limited to massive volume and variety: Compliance to regulatory document retention practices A bility to retrieve required information instantaneously through complex search queries Enterprise applications focusing on Information Lifecycle Management often are associated with heavy licensing and acquisition costs. At times costs are associated with the scalability requirements making it difficult to tailor based on end user needs Not Yet Enough for Big Data The advent of Big Data has given us a new outlook to address these challenges. The ability of Big Data technologies to store large volumes of structured and unstructured data, arriving at high rates, all at low cost, makes it the most suitable candidate to take the position of data archival solution. Here are the key requirements of a data archival solution: Scalable to large volumes and variety of data T iered storage: High availability and widely accessibility (Web, mobile) Support for analytical and content applications Supports workflow automation Integrates legacy applications Runs on public, private and hybrid cloud environments Ability to self-heal without customer intervention MongoDB as the Foundation for a Scalable Data Archival Solution Going through our above wish list, it doesn't take us much time to recognize that MongoDB passes the litmus test. Given below is one proposed design we architected on how we can leverage MongoDB as a scalable back-end solution to come up with an enterprise-class data archival solution: Scalable service layer - REST-ful web service API layer enabling enterprises to integrate with the front-end application of their choice, scalable to handle high throughputs and requests (ingestion of petabytes of data/day) Data persistence layer - Easily leveraging GridFS to store large binary size files and MongoDB collections for associated metadata. We can also use sharding for better write distribution thereby peaking the solution performance Indexing/Searching layer - While MongoDB offers secondary indexing, we can integrate Solr to leverage features like quick response time, full text search, faceted and range search, hit highlighting etc. Synchronization layer - Controller synchronizing persistence of file and indexing of metadata while queuing incoming requests Other than the design features, MongoDB offers numerous advantages in designing applications and integrating them with front-end technologies due to MongoDB’s rich driver support. Its replicated setup allows us to keep systems up-to-date with no downtime. The application is deployable on cloud as SaaS, and allows analytics on stored objects. Benefits of a MongoDB-based Archival Solution Among other things, a MongoDB-based data archival solution offers the following benefits: Extendable solution designed to accomplish long-term storage needs Fast and effective search of content by name, keywords or even complete text Cost effective, runs on commodity hardware A data archival solution leveraging MongoDB would offer tremendous value for various enterprise use cases. For example, consider the Media and Publishing market. A news website might produce a huge amount of content each day, including news articles, feeds for readers, related videos and audio content, images, logs, user comments and chat transcripts. Not only would such an organization produce such varied content, but it would also need to archive the content for long-term retention and future reference. In addition, archival of articles is becoming standard procedure for compliance, auditability, and litigation support purposes. By designing a data archival solution leveraging MongoDB, the data archivist not only has the advantage of business agility but also benefits from a broader scope for analysis and lack of dependence on IT for organization of her files. The data archival space has come a long way. With an enterprise data archival solution leveraging MongoDB , we can be assured that the challenges around VOLUME, VARIETY & VELOCITY of data can be handled in an agile and elegant way. CIGNEX Datamatics Inc. (a subsidiary of Datamatics Global Services Ltd.) is the global leader in Commercial Open Source Enterprise solutions and a global partner of 10gen (MongoDB) offering advisory consulting, implementation, and support services around the MongoDB application. Since year 2000, CIGNEX Datamatics has implemented over 400 Open Source enterprise solutions addressing enterprise requirements from Portals, Content to Big Data solutions. For more details, contact: Yash Badiani at yash dot badiani at cignex dot com. Tagged with: Data Archival, Big Data, Cignex

February 7, 2013

Top Big Data skills? MongoDB and Hadoop

According to new research from the UK’s Sector Skills Council for Business and Information Technology, the organization responsible for managing IT standards and qualifications, Big Data is a big deal in the UK, and MongoDB is one of the top Big Data skills in demand. This meshes with SiliconAngle Wikibon research I highlighted earlier, detailing Hadoop and MongoDB as the top-two Big Data technologies. It also jibes with JasperSoft data that shows MongoDB as one of its top Big Data connectors: MongoDB is a fantastic operational data store. As soon as one remembers that Big Data is a question of both storage and processing, it makes sense that the top operational data store would be MongoDB, given its flexibility and scalability. Foursquare is a great example of a customer using MongoDB in this way. On the data processing side, a growing number of enterprises use MongoDB both to store and process log data, among other data analytics workloads. Some use MongoDB with its built-in MapReduce functionality, while others choose to use the Hadoop connector or MongoDB’s Aggregation Framework to avoid MapReduce. Whatever the method or use case, the great thing about Big Data technologies like MongoDB and Hadoop is that they’re open source, so the barriers to download, learn, and adopt them are negligible. Given the huge demand for Big Data skills, both in the UK and globally, according to data from Dice and Indeed.com , it’s time to download MongoDB and get started on your next Big Data project. Tagged with: MongoDB, Hadoop, Big Data, open source, operational database, Foursquare, IT jobs, jobs

January 8, 2013