Meet Variety, a Schema Analyzer for MongoDB
Variety is a lightweight tool which gives a feel for an application’s schema, as well as any schema outliers. It is particularly useful for
• quickly learning how data is structured, if inheriting a codebase with a production data dump
• finding all rare keys in a given collection
An Easy Example
We’ll make a collection, within the MongoDB shell:
db.users.insert({name: "Tom", bio: "A nice guy.", pets: ["monkey", "fish"], someWeirdLegacyKey: "I like Ike!"}); db.users.insert({name: "Dick", bio: "I swordfight."}); db.users.insert({name: "Harry", pets: "egret"}); db.users.insert({name: "Geneviève", bio: "Ça va?"});
Let’s use Variety on this collection, and see what it can tell us:
$ mongo test --eval "var collection = 'users'" variety.js
The above is executed from terminal.“test” is the database containing the collection we are analyzing.
Variety’s output:
{ "_id" : { "key" : "_id" }, "value" : { "types" : [ "object" ] }, "totalOccurrences" : 4, "percentContaining" : 100 } { "_id" : { "key" : "name" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 4, "percentContaining" : 100 } { "_id" : { "key" : "bio" }, "value" : { "types" : [ "string" ] }, "totalOccurrences" : 3, "percentContaining" : 75 } { "_id" : { "key" : "pets" }, "value" : { "types" : [ "string", "array" ] }, "totalOccurrences" : 2, "percentContaining" : 50 }
{ "_id" : { "key" : "someWeirdLegacyKey" }, "value" : { "type" : "string" }, "totalOccurrences" : 1, "percentContaining" : 25 }
Every document in the “users” collection has a “name” and “_id”. Most, but not all have a “bio”. Interestingly, it looks like “pets” can be either an array or a string. The application code really only expects arrays of pets. Have we discovered a bug, or a remnant of a previous schema?
The first document created has a weird legacy key I’ve never seen before- the people who built the prototype didn’t clean up after themselves. These rare keys, whose contents are never used, have a strong potential to confuse developers, and could be removed once we verify our findings.
For future use, results are also stored a varietyResults database.
Learn More!
Learn more about Variety now, including
• How to download Variety
• How to set a limit on the number of documents analyzed from a collection
• How to contribute, and report issues
Variety is free, open source, and written in 100% JavaScript. Check it out on Github.
-by James Cropcho