Schemaless Database



MongoDB is a JSON-style data store.  The documents stored in the database can have varying sets of fields, with different types for each field.  One could have the following objects in a single collection:

{ name : “Joe”, x : 3.3, y : [1,2,3] }

{ name : “Kate”, x : “abc” }

{ q : 456 }

Of course, when using the database for real problems, the data does have a fairly consistent structure.  Something like the following would be more common:

{ name : “Joe”, age : 30, interests : ‘football’ }

{ name : “Kate”, age : 25 }

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages.  Constructs such as those above are easy to represent in PHP, Python and Ruby.  What we are trying to do here is make this mapping to the database natural.

Note the database does have some structure.  The system namespace contains explicit lists of our collections and indexes.  Collections may be implicitly or explicitly created, while indexes are explicitly declared (except for predefined _id index).

One of the great benefits of these dynamic objects is that schema migrations become very easy.  With a traditional RDBMS, releases of code might contain data migration scripts.  Further, each release should have a reverse migration script in case a rollback is necessary.  ALTER TABLE operations can be very slow and result in scheduled downtime.

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic.  For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well – if we look up an existing student and reference GPA, we just get back null.  Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.