Meet Variety, a Schema Analyzer for MongoDB

MongoDB

#Releases

Variety is a lightweight tool which gives a feel for an application’s schema, as well as any schema outliers. It is particularly useful for

• quickly learning how data is structured, if inheriting a codebase with a production data dump

• finding all rare keys in a given collection

An Easy Example

We’ll make a collection, within the MongoDB shell:

db.users.insert({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"});
db.users.insert({name: "Dick", bio: "I swordfight."}); 
db.users.insert({name: "Harry", pets: "egret"});
db.users.insert({name: "Geneviève", bio: "Ça va?"}); 

Let’s use Variety on this collection, and see what it can tell us:

$ mongo test --eval "var collection = 'users'" variety.js

The above is executed from terminal.“test” is the database containing the collection we are analyzing.

Variety’s output:

{ "_id" : { "key" : "_id" }, "value" : { "types" : [ "object" ] }, "totalOccurrences" : 4, "percentContaining" : 100 }
{ "_id" : { "key" : "name" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 4, "percentContaining" : 100 }
{ "_id" : { "key" : "bio" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 3, "percentContaining" : 75 }
{ "_id" : { "key" : "pets" }, "value" : { "types" : [ "string", "array" ] }, "totalOccurrences" : 2, "percentContaining" : 50 }
{ "_id" : { "key" : "someWeirdLegacyKey" }, "value" : { "type" : "string" }, "totalOccurrences" : 1, "percentContaining" : 25 }

Every document in the “users” collection has a “name” and “_id”. Most, but not all have a “bio”. Interestingly, it looks like “pets” can be either an array or a string. The application code really only expects arrays of pets. Have we discovered a bug, or a remnant of a previous schema?

The first document created has a weird legacy key I’ve never seen before- the people who built the prototype didn’t clean up after themselves. These rare keys, whose contents are never used, have a strong potential to confuse developers, and could be removed once we verify our findings.

For future use, results are also stored a varietyResults database.

Learn More!

Learn more about Variety now, including

• How to download Variety

• How to set a limit on the number of documents analyzed from a collection

• How to contribute, and report issues

Variety is free, open source, and written in 100% JavaScript. Check it out on Github.

-by James Cropcho