Beyond NoSQL: A Modern Database Manifesto
There is no such thing as NoSQL. Not as we tend to think of it, anyway. While NoSQL was born as a movement away from rigid relational data models so web giants could embrace Big Data with scale-out architectures, the term has come to categorize a set of databases that are more different than they are the same. This broad categorization doesn’t work. It’s not helpful. While we at MongoDB still sometimes refer to NoSQL, we try to do it sparingly, given its propensity to confuse rather than enlighten. Deconstructing NoSQL Today the NoSQL category includes a cacophony of over 100 document, key-value, wide-column and graph databases . Each of these database types comes with its own strengths and limits. Each differs markedly from the others, with disparate models and capabilities relative to data storage, querying, consistency, scalability and high availability. Comparing a document database to a key-value store, for example, is like comparing a smartphone to a beeper. A beeper is exceptionally useful for getting a simple message from Point A to Point B. It’s fast. It’s reliable. But it’s nowhere near as functional as a smartphone, which can quickly and reliably transmit messages, but can also do so much more. Both are useful, but the smartphone fits a far broader range of applications than the more limited beeper. As such, organizations searching for a database to tackle Gartner’s three V’s of Big Data -- volume, velocity and variety -- won’t find an immediate answer in “NoSQL.” Instead, they need to probe deeper for a modern database that can handle all of their Big Data application requirements. Modern Databases For Modern Data One of these requirements is, of course, the ability to handle large volumes of data, the original impetus behind the NoSQL movement. But the ability to handle volume, or scale, is something all databases categorized as “NoSQL” share. MongoDB, for example, counts among its users those who regularly store petabytes of data, perform over 1,000,000 operations per second and clusters that exceed 1,000 nodes. A modern database, however, must do more than scale. Scalability is table stakes. It also must enable agility to accelerate development and time to market. It must allow organizations to iterate as they embrace new business requirements. And a modern database must, above all, enable enterprises to take advantage of rapidly growing data variety. Indeed the “greatest challenge and opportunity” for enterprises, as Forrester notes, is managing a “variety of data sources,” including data types and sources that may not even exist today. In general, all so-called NoSQL databases are much more helpful than relational databases at storing a wide variety of data types and sources, including mobile device, geospatial, social and sensor data. But the hallmark of a modern database its ability to allow organizations to do useful things with their data. Defining The Modern Database To count as a modern database, then, a database must meet three requirements. While relational databases are able to manage some of these requirements, and newer so-called “NoSQL” key-value or wide column data stores meet others, only MongoDB meets all three requirements. The database MUST scale . As data volume and velocity grows, so the database must grow too. It should scale horizontally and elegantly, without doing unnatural things to your application, in the cloud or on commodity hardware. Meeting the base requirements -- like having enough capacity to serve your customers -- should be a given. The database MUST adapt to change . The speed of business accelerates and your database must keep pace, enabling iteration. This means you must be able to process and mine new data sources and data types without the database breaking a sweat (or you breaking your back or budget). Your schema must flow from your application requirements, rather than forcing your application to fit a predefined, rigid schema. The database MUST unleash your data . Just storing data isn’t enough. You must be able to exploit the data, which particularly means you must be able to ask significant questions of your data. In part this means that the database must support rich queries, indexing, aggregation and search across multi-structured, rapidly changing data sets in real time. But it also means that it must support data for modern use cases including mobile, social, Internet of Things and other systems of engagement. Some relational databases can handle a few of these requirements, yet fail in the essential need to deliver scale and adaptability. Some newer databases, including so-called “NoSQL” key-value or wide column data stores, meet still other requirements, yet don’t give organizations the latitude to unleash their data. In fact, they constrain you to look up data by the key with which it was written unless you integrate external search engines and analytics nodes, which can create other problems. MongoDB: A Modern Database For Today's Business Needs But only one database today can deliver on each of these critical components of a modern database. Only one database offers orders of magnitude more productivity for developers and operations teams alike, while still delivering petabyte scale and lightning-fast performance. Only MongoDB, the modern database that tens of thousands of organizations depend upon to build and run today’s most demanding applications. To learn more about how MongoDB has enabled some of the world’s largest and most innovative companies to deliver applications and outcomes that were previously impossible, download our new whitepaper .
Looking beyond labels like relational and NoSQL
According to a new Dice.com salary survey , MongoDB ranks as one of top-10 most highly compensated technology skills. Indeed.com rates MongoDB as the second hottest job trend. And DB-Engines.com, which ranks over 200 databases on their relative popularity, MongoDB is now the fifth-most popular database in the world, this month surpassing IBM's DB2. All great, right? Maybe. Buried in the Dice.com data, as well as the Indeed.com data, is evidence of real confusion. For example, of the top-10 most highly compensated skills in Dice.com's survey is "NoSQL ." NoSQL is not a technology. It's not really something a developer can "know" in any real sense. NoSQL is a movement that describes a different way of modeling data but, as Basho founder Justin Sheehy correctly noted , there are as many differences among so-called NoSQL databases as there are similarities. As such, knowing Basho's Riak won't really help you understand MongoDB. Perhaps at a high, conceptual level, but expertise in one doesn't really translate into familiarity with another. They are different databases with different approaches. Employers looking for generic NoSQL skills need to think more deeply about what their application requirements are. Looking beyond relational databases for modern application requirements is a good start, but looking to generic "NoSQL" is not sufficient. Organizations should be looking for a modern database that dramatically improves developer productivity, encourages application iteration and enables a new wave of transformational applications in areas like Big Data , Internet of Things , mobile and more . That database is MongoDB. Is MongoDB "NoSQL." Sure. But it's much bigger than that ( based on what people search for on Google , many organizations already seem to understand this). MongoDB is the fastest-growing database in the world , not because it fits the NoSQL category, but because it significantly improves the productivity of developers and the organizations for which they work. So if you're looking to hire technology talent, you're far more likely to be successful hiring an experienced MongoDB engineer than a "NoSQL engineer." MongoDB, after all, is an actual database. NoSQL simply describes an important movement.
MongoDB's $150 Million Funding Round: It's about the Customer Experience
Today MongoDB announced that we raised $150 million from a variety of investors both new (Salesforce.com, T.Rowe Price, EMC and others) and old (Sequoia, Red Hat, NEA, Flybridge, etc.). It's a great day for MongoDB, both the company and the project. But mostly it's a great day for our customers and the MongoDB community in which they participate. Hip With The Hackers Over the last few years MongoDB has solidified its position as the industry's leading NoSQL database and the fastest-growing Big Data community . With this funding round, MongoDB is also the best funded Big Data technology. As enterprises invest in Big Data, they turn to the two dominant Big Data technologies, MongoDB and Hadoop , as Wikibon analysis has shown. Importantly, as can be seen in an analysis of LinkedIn profiles by 451 Research, very often enterprises discover that they already have MongoDB expertise within their organizations: Much of this success derives from MongoDB giving developers a better way to create applications . Rather than commoditizing a legacy relational database (RDBMS) market, similar to what other open-source RDBMSs have done, MongoDB significantly increases developer productivity by offering them a flexible data model. MongoDB is a significant part of what Cowen & Co. analyst Peter Goldmacher calls a "fundamental shift in the technology landscape away from legacy systems towards a new breed of better products at a lower cost for Data Management, Apps and in other areas." In other words, MongoDB is empowering the next generation of applications: post-transactional applications that rely on bigger data sets that move much faster than an RDBMS can handle. Developers have responded, voting with their apps, a considerable number of which are backed by MongoDB. A Means, Not An End Given the opportunity ahead of us, MongoDB would be irresponsible to raise less. While most of our funding comes from rapidly growing revenues, the MongoDB board of directors determined that it would be advantageous to the project and, hence, to our customers, to accelerate growth. After all, our relational database competitors have a 30-year headstart. As Max Schireson, MongoDB's CEO, articulated on his blog: We are in a market dominated by technologies with over 30 years of engineering in them. Their designs may not be as well suited to modern applications, but they are very mature, very feature rich, and have huge partner ecosystems and big companies that understand the needs of their enterprise customers behind them. They have way more tooling – and decades of refinement of operational tools. This is why we are raising $150 million. We know that it will take a large and sustained effort to build the maturity that many users expect in this market. Building out our management suite and enhancing the core product will be a ton of work. We have made great progress on security, management, stability, and scalability but we still have so much to do. For next-generation workloads in the cloud, MongoDB is already taking a lead, as Amazon Web Services data from Stackdriver seems to suggest: But MongoDB isn't intended to be a cloud-only database. It's a general purpose database, designed to be a great fit for the vast majority of worklads. We want to make it easy to run on a single node or at massive scale in the cloud or on premise. Whatever the customer needs. This funding will help. Helping Ops Fall In Love With MongoDB Some of that work will be done by MongoDB's exceptional community of developers and business partners. Among other things, the MongoDB community has contributed over 20 drivers, tripling the language compatibility of MongoDB and making it much more approachable for developers, whatever their preferred programming language. But some of it will necessarily be done by MongoDB, Inc. From Linux to JBoss to Drupal, much of the best tooling has had to be developed by a focused, highly incentivized company. MongoDB is no different. We believe we have built the world's best database for developers. Now we need to make sure it is also the world's best database for Operations professionals. So that means an improved and expanded management suite. We recently added Backup , but there are other areas that will help Operations professionals more easily manage MongoDB at the scale that we increasingly see enterprises run the database. Outside of tooling, we also recognize that we need to continue to make improvements to MongoDB's concurrency, further optimize performance and more. We don't by any stretch think we're done. The Path Forward But we're making excellent progress. In the last year since I joined MongoDB I've seen the company double its headcount and dramatically expand sales. This funding not only lets us make significant investments in improving MongoDB for both developers and Operations, but it also helps us to fund expansion geographically. We're already growing 300% or more in Europe year-over-year, and expect much of the same in Asia-Pacific. We need to help support our customers wherever they may be. Given the historic opportunity before MongoDB, it's time to step on the accelerator. Hard. -- If you're interested, please find more coverage of the funding at BusinessWeek , GigaOm , TechCrunch , VentureBeat , and ZDNet .
Pearson National Transcript Center runs MongoDB
High school students only have to worry about one transcript: their own. But for Pearson , a multi-billion dollar learning company that operates in over 70 countries and employs some 36,000 people, its transcript management problem is much bigger. Pearson Education manages the transcripts for over 14 million students from more than 25,000 institutions, and makes and allows NTC member institutions to securely send records and transcripts to any of over 137,000 academic institutions, not to mention employers, licensure agencies, and scholarship organizations. To manage this big data problem, Pearson turned to MongoDB as the underlying database for its National Transcript Center . Pearson’s National Transcript Center isn’t merely a data store for student transcripts. Pearson stores student data and also transforms it from one standard format to another, including PESC High School Transcript XML, PESC College Transcript XML, SPEEDE EDI, SIF Student Record Exchange, and others. Pearson also generates PDF copies of a student’s records, and provides print copies when electronic delivery is not available. The impetus to use MongoDB was a request to archive student data at the end of each year, rather than deleting it. If the student had graduated, why keep her records around? As it turned out, there was plenty of reasons, including the potential need to transfer records between higher educational institutions or on to employers. But how best to store and manage this student data? Pearson had been using an open-source relational database (RDBMS) to store the student records. However, Pearson ran into performance problems with this RDBMS, problems that would compound each year. The idea of taking a year’s worth of student records and sticking it in a separate table, then sharding over and over as the years passed was going to make performance even worse. So Pearson turned to a key-value NoSQL database. Unfortunately, this too, posed problems. Pearson had no idea what a student record would look like in the future and so needed a dynamic schema. The company did not want to keep creating new tables as fields changed. Another problem with this key-value data store was that its filtering mechanism was hard to work with as Pearson employs very complicated queries, where the company searches different fields at the same time. It proved too difficult to get all that query data marshaled with a key-value database. At this point, Pearson decided to give MongoDB a try. Pearson’s development team immediately appreciated the ease of working with MongoDB’s flexible and dynamic data model. But it was perhaps MongoDB’s query mechanism that sold the team on using the NoSQL database. Mongo automatically converted Pearson’s queries from Hibernate into MongoDB. Pearson had Hibernate criteria calls, which allowed the team to avoid building SQL queries by hand. This work mapped directly to MongoDB, saving Pearson time and trouble. Other benefits became apparent over time. With Pearson’s original RDBMS approach, Pearson would have been forced to search gigantic tables when querying the student records. But with MongoDB, if Pearson starts putting too much data in a namespace, it can easily shard the namespace in MongoDB, for example, enabling search by district rather than of an entire state. Hence, instead of storing student data in a blob, as happened with the RDBMS, Pearson is able to use MongoDB’s GridFS, enabling Pearson to keep files and metadata automatically synced and deployed across a number of systems and facilities. For students looking to get into a good college or employer, their transcript is their passport. By using MongoDB, Pearson has been able to boost performance for its end-users, all while improving ease of use and productivity for its developers. Tagged with: Pearson, education, National Transcript Center, GridFS, RDBMS, case study, MongoDB, NoSQL
Considerations before moving from RDBMS to MongoDB
There are a variety of reasons for moving from a relational database (RDBMS) to MongoDB. Perhaps, like FamilySearch , the family history division of The Church of Jesus Christ of Latter-day Saints, a company wants to improve response times from 3 seconds (RDBMS) to under 15 milliseconds (MongoDB). Or perhaps, like Apollo Group (PDF), the private education giant behind University of Phoenix, an enterprise is hoping to store unstructured data and scale to support anticipated growth in the number of users and volume of your content. Whatever the reason for moving off a relational database for MongoDB, it’s important to plan appropriately. In my role I get to work directly with MongoDB users like Telefonica , nearly all of which come from a relational database background. Sometimes when people are fed up with using SQL, or they see MongoDB as a way to scale, they decide to migrate an application designed for a relational database directly to MongoDB…without rethinking the data model and architecture of their application. There are good ways to map SQL executables to MongoDB , but this isn’t one of them. Another error-prone migration “strategy” is driven by the blind usage of Object Document Mappers (ODM) and Object Relational Mappers (ORM) that shield a lot of the complexity of manipulating a database, but can also contribute to poor data model design. So when considering a direct migration from RDBMS to MongoDB, it’s important to be attentive to some issues: Too many collections (10+) - This will lead to poor performance and questions like ...How can I do a join in Mongo?â€œ (Answer: you can’t, but there are ways to accomplish the same thing in MongoDB) Many indexes - indexing all the fields in a document is a bad approach and leads to bad insert performance. I've seen cases where, for a given database, customers end up with more space occupied by indexes then data. This is not a good practice anywhere. The first question to ask, then, when moving from a relational database to MongoDB is, ‘How will this data be accessed?’ Other important questions include: What is the access pattern? What are you hoping to show to your customers/users? How are you going to write this data? These should be the first questions people ask themselves before migrating data from an RDBMS into MongoDB. Indeed, these should be the main questions someone should be asking before using any persistence layer. As stated, there are a lot of great reasons to use MongoDB instead of a relational database, but careful planning is required to pull off a successful migration. Posted by Norberto Leite, Senior Solutions Architect for EMEA, 10gen. Tagged with: rdbms, MongoDB, ORM, ODM, migration
Living in the post-transactional database future
Given that we’ve spent decades building applications around relational databases, it’s not surprising that the first response to the introduction of NoSQL databases like MongoDB is sometimes “Why?” Developers aren’t usually the ones asking this question, because they love the approachability and flexibility MongoDB gives them. But DBAs who have built their careers on managing heavy RDBMS infrastructure? They’re harder to please. 10gen president Max Schireson estimates that 60 percent of the world’s databases are operational in nature, which is MongoDB’s market. Of those use cases, most of them are ripe for a non-relational approach. The database market is rapidly changing, and very much up for grabs. Or as Redmonk analyst James Governor puts it , “The idea that everything is relational? Those days are gone.” As useful as relational databases are (and they’re very useful for a certain class of application), they are losing relevance in a world where complex transactions are more the exception, less the rule. In fact, I’d argue that over time, the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology, not legacy RDBMS. That’s the future. What about now? Arguably, many of the applications being built today are already post-transaction, ripe for MongoDB and poor fits for RDBMS. Consider: Amazon: its systems that process order transactions (RDBMS) are largely “done” and “stable”. Amazon’s current development is largely focusing on how to provide better search and recommendations or how to adapt prices on the fly (NoSQL). Netflix: the vast majority of it engineering is focusing on recommending better movies to you (NoSQL), not processing your monthly bill (RDBMS). Square: the easy part is processing the credit card (RDBMS). The hard part is making it location aware, so it knows where you are and what you’re buying (NoSQL). It’s easy, but erroneous, to pigeon-hole these examples as representative of an anomalous minority of enterprises. Yes, these companies represent the cutting edge of both business and technology. But no, they are not alone in building these sorts of applications. For every early-adopter Netflix there’s a sizable, growing population of mainstream companies in media (e.g., The Guardian ), finance (e.g., Intuit ), or other verticals that are looking to turn technology into a revenue-driving asset, and not simply something that helps keep the lights on and payrolls running. When what we built were websites, RDBMS worked great. But today, we’re building applications that are mobile, social, involve high volume data feeds, incorporate predictive analytics, etc. These modern applications? They don’t fit RDBMS. Andy Oliver lists 10 things never to do with a relational database , but the list is much longer, and growing. MongoDB is empowering the next generation of applications: post-transactional applications that rely on bigger data sets that move much faster than an RDBMS can handle. Yes, there will remain a relatively small sphere of applications unsuitable for MongoDB (including applications with a heavy emphasis on complex transactions), but the big needs going forward like search, log analysis, media repositories, recommendation engines, high-frequency trading, etc.? Those functions that really help a company innovate and grow revenue? They’re best done with MongoDB. Of course, given RDBMS’ multi-decade legacy, it’s natural for developers to try to force RDBMS to work for a given business problem. Take log analysis, for example. Oliver writes: Log analysis : …[T]urn on the log analysis features of Hadoop or RHQ/JBossON for a small cluster of servers. Set the log level and log capture to anything other than ERROR. Do something more complex and life will be very bad. See, this kind of somewhat unstructured data analysis is exactly what MapReduce à la Hadoop and languages like PIG are for. It’s unfortunate that the major monitoring tools are RDBMS-specific — they really don’t need transactions, and low latency is job No. 1. For forward-looking organizations, they already realize that MongoDB is an excellent fit for log management, which is why we see more and more enterprises turning to MongoDB for this purpose. I expect this to continue. As MongoDB continues to enrich its functionality , the universe of applications for which it is not merely applicable, but also better , will continue to expand, even as the universe of applications for which RDBMS is optimal will decline. Indeed, we’re already living in a post-transactional world. Some people just don’t know it yet. (Or, as William Gibson would say, “The future is already here – it’s just not very evenly distributed.”) Posted by Matt Asay , vice president of Corporate Strategy, with significant help from my inestimable colleague, Jared Rosoff . Tagged with: NoSQL, MongoDB, RDBMS, relational, James Governor, Redmonk, log analysis, Andy Oliver, transactions, Netflix, Amazon, Square, operational database, DBA