March 9, 2010 by MongoDB | Comments
Document-oriented and object-oriented databases are philosophically more different than one might at first expect.
In both we have a somewhat standardized document/object representation — typically JSON in currently popular for document-oriented stores, perhaps ODL in ODBMS. The nice thing with JSON is that at least for web developers, JSON is already a technology they use and are familiar with. We are not adding something new for the web developer to learn.
In a document store, we really are thinking of “documents”, not objects. Objects have methods, predefined schema, inheritance hierarchies. These are not present in a document database; code is not part of the database.
While some relationships between documents may exist, pointers between documents are deemphasized. The document store does not persist “graphs” of objects — it is not a graph database. (Graph databases/stores are another new NoSQL category - what is the different between a graph database and an ODBMS? An interesting question.)
Schema design is important in document databases. One doesn’t think in terms of “just persist what i work with in RAM from my program”. We still define a schema. This schema may vary from the internal “code schema” of the application. For example in MongoDB, we have collections (analogous to a table) of JSON documents, and explicit declaration of indexes on specific fields for the collection. We think this approach has some merits — a decoupling of data and code. Code tends to change fast.
Embedding is an important concept in document stores. It is much more common to nest data within documents than have references between documents.
Why the deemphasis of relationships? A couple reasons. First, with arbitrary graphs of objects, it is difficult to process the graph from a client without many client/server turnarounds. A goal with document databases is to maintain the client/server paradigm and keep code biased to the client (albeit with some exceptions such as map/reduce). Second, a key goal in the “NoSQL” space is horizontal scalability. Arbitrary graphs of objects would be difficult to partition among servers in a guaranteed performant manner.