January 20, 2010 by MongoDB | Comments
As an alternative to legacy / SQL systems, technologies that are scalable and can better handle non-partitioned data have emerged. It appears that the ...NoSQLâ€œ moniker has stuck for what simply can be defined as non-relational operational databases. Non-relational operational databases, or operational data stores, tend to have two key attributes across the board. One is that they're non-relational, so they're not doing joins on the server. And second, they have light transactional semantics. So complex, long-running, serialized transactions are not part of any of these NoSQL products. Those two differences, put together, allow you to take a very different approach to how databases are created, which means you can make horizontally scalable databases — the kind that run across large clusters of machines.
The list of NoSQL products is long. For the sake of this post, I want to clarify at the highest level how some technologies - namely MongoDB, CouchDB and Cassandra - differ within this NoSQL group. Today, there are two main themes emphasized in the NoSQL space -- scale (as in Google-level scale) and ease of development. I think there's some general agreement that Cassandra type products deal with scale, while CouchDB and MongoDB deal with ease of development. That being said, you can't completely separate these spaces because as the space matures, all of the products are going to scale very well. I guess the Holy Grail, or vision, for NoSQL is to provide solutions that make it easier for developers to build web applications or other applications that require a data store behind them.
One reason why NoSQL, or some iteration, is here to stay is that the way computer architectures are heading, having systems that can run across multiple machines is going to be an absolute requirement. The limitations of vertical scaling are going to get worse and worse. You're going to get new chips that have more and more CPU cores on them, but the speed isn't much higher. And they're going to be cheaper too so you can get more computers but you're not going to be able to get one computer that's really fast at any price. But you're going to be able to get 1000 computers that are not terribly fast really cheaply. So the question is, at the data storage layer, can you leverage that? The traditional approach is no, not without a lot of manual effort. But changing computer architectures, as well as the growth of cloud computing, necessitates a better set of database systems built to achieve scale. These new solutions are going to solve that and it's going to be critical. We want a new set of tools for the data storage layer that work well with those cloud principles, which are things like infinite scalability, low to 0 configuration, and ease of development without friction.