Document Validation – Part 2: Putting it all Together, a Tutorial
Introduction
This is the second and final post in a series looking at document validation in MongoDB 3.2; if you haven’t already read the first blog in this series then you should read it now.
The intent of this post is to step you through exactly how document validation can be introduced into an existing production deployment in such a way that there is no impact to your users. It covers:
- Setting up some test data (not needed for a real deployment)
- Using MongoDB Compass and the
mongo
shell to reverse engineer the de facto data model and identify anomalies in the existing documents - Defining the appropriate document validation rules
- Preventing new documents being added which don’t follow the new rules
- Bring existing documents “up to spec” against the new rules

Tutorial
This section looks at taking an existing, deployed database which currently has no document validations defined. It steps through understanding what the current document structure looks like; deciding on what rules to add and then rolling out those new rules.
As a pre-step add some data to the database (obviously, this isn't needed if working with your real deployment).
use clusterdb;
db.dropDatabase();
use clusterdb();
db.inventory.insert({ "_id" : 1, "sku" : "abc", 
 "description" : "product 1", "instock" : 120 });
db.inventory.insert({ "_id" : 2, "sku" : "def", 
 "description" : "product 2", "instock" : 80 });
db.inventory.insert({ "_id" : 3, "sku" : "ijk", 
 "description" : "product 3", "instock" : 60 });
db.inventory.insert({ "_id" : 4, "sku" : "jkl", 
 "description" : "product 4", "instock" : 70 });
db.inventory.insert({ "_id" : 5, "sku" : null, 
 "description" : "Incomplete" });
db.inventory.insert({ "_id" : 6 });
<p>for (i=1000; i<2000; i++) {
db.orders.insert({
_id: i,
item: "abc",
price: i % 50,
quantity: i % 5
});
};</p>
<p>for (i=2000; i<3000; i++) {
db.orders.insert({
_id: i,
item: "jkl",
price: i % 30,
quantity: Math.floor(10 * Math.random()) + 1
});
};</p>
<p>for (i=3000; i<3200; i++) {
db.orders.insert({
_id: i,
price: i % 30,
quantity: Math.floor(10 * Math.random()) + 1
});
};</p>
<p>for (i=3200; i<3500; i++) {
db.orders.insert({
_id: i,
item: null,
price: i % 30,
quantity: Math.floor(10 * Math.random()) + 1
});
};</p>
<p>for (i=3500; i<4000; i++) {
db.orders.insert({
_id: i,
item: "abc",
price: "free",
quantity: Math.floor(10 * Math.random()) + 1
});
};</p>
<p>for (i=4000; i<4250; i++) {
db.orders.insert({
_id: i,
item: "abc",
price: "if you have to ask....",
quantity: Math.floor(10 * Math.random()) + 1
});
};

The easiest way to start understanding the de facto schema for your database is to use MongoDB Compass. Simply connect Compass to your mongod
(or mongos
if you're using sharding) and select the database/collection you'd like to look into. To see MongoDB Compass in action – view this demo video.
As shown in Figure 2, there are typically four keys in each document from the clusterdb.orders
table:
_id
is always present and is a numberitem
is normally present and is a string (either "abc" or "jkl") but is occasionallynull
or missing altogether (undefined)price
is always present and is in most cases a number (the histogram shows how the values are distributed between 0 and 49) but in some cases it's a stringquantity
is always present and is a number

For this tutorial, we'll focus on the price
. By clicking on the string
label, Compass will show us more information about the string content for price
- this is shown in Figure 3.

Compass shows us that:
- For those instances of
price
which are strings, the common values are "free" and "if you have to ask....". - If you click on one of those values, a query expression is formed and clicking "Apply" runs that query and now Compass will show you information only for that subset of documents. For example, where
price == "if you have to ask...."
(see Figure 4). - By selecting multiple attributes, you can build up fairly complex queries.
- The query you build visually is printed at the top so you can easily copy/paste into other contexts like the shell.

If applications are to work with the price
from these documents then it would be simpler it it was always set to a numerical value, and so this is something that should be fixed.
Before cleaning up the existing documents, the application should be updated to ensure numerical values are stored in the price field. We can do this by adding a new validation rule to the collection. We want this rule to:
- Allow changes to existing invalid documents
- Prevent inserts of new documents which violate validation rules
- Set up a very simple document validation rule that checks that
price
exists and contains adouble
– see the enumeration of MongoDB BSON types
These steps should be run from the mongo
shell:
db.orders.runCommand("collMod", 
 {validationLevel: "moderate", 
 validationAction: "error"});
<p>db.runCommand({collMod: "orders",
validator: {
price: {$exists: true},
price: {$type: 1}
}
});</p>
<pre><code>
The validation rules for this collection can now be checked:
```javascript
db.getCollectionInfos({name:"orders"})
[
 {
 "name": "orders",
 "options": {
 "validator": {
 "price": {
 "$type": 1
 }
 },
 "validationLevel": "moderate",
 "validationAction": "error"
 }
 }
]
</code></pre>
Now that this has been set up, it's possible to check that we can't add a new document that breaks the rule:

<pre><code>db.orders.insert({
 "_id": 6666, 
 "item": "jkl", 
 "price": "rogue",
 "quantity": 1 });

Document failed validation
WriteResult({
 "nInserted": 0,
 "writeError": {
 "code": 121,
 "errmsg": "Document failed validation"
 }
})
</code></pre>
But it's OK to modify an existing document that does break the rule:

<pre><code>db.orders.findOne({price: {$type: 2}});

{
 "_id": 3500,
 "item": "abc",
 "price": "free",
 "quantity": 5
}

> db.orders.update(
 {_id: 3500},
 {$set: {quantity: 12}});

Updated 1 existing record(s) in 5ms
WriteResult({
 "nMatched": 1,
 "nUpserted": 0,
 "nModified": 1
})</code></pre>
Now that the application is no longer able to store new documents that break the new rule, it's time to clean up the "legacy" documents. At this point, it's important to point out that Compass works on a random sample of the documents in a collection (this is what allows it to be so quick). To make sure that we're fixing **all** of the documents, we check from the `mongo` shell. As the following commands could consume significant resources, it may make sense to run them on a secondary):

<pre><code>secondary> db.orders.aggregate([
 {$match: {
 price: {$type: 2}}},
 {$group: {
 _id: "$price", 
 count: {$sum:1}}}
 ])

{ "_id" : "if you have to ask....", "count" : 250 }
{ "_id" : "free", "count" : 500 }
</code></pre>
The number of exceptions isn't too high and so it is safe to go ahead and fix up the data without consuming too many resources:

<pre><code>db.orders.update(
 {price:"free"},
 {$set: {price: 0}},
 {multi: true});

db.orders.update(
 {price:"if you have to ask...."},
 {$set: {price: 1000000}},
 {multi: true});</code></pre>
At this point it's now safe to enter the strict mode where any inserts or updates will cause an error if the document being stored doesn't follow the rules:

<pre><code>db.orders.runCommand("collMod", 
 {validationLevel: "strict", 
 validationAction: "error"});</code></pre>

<h3 id="next-steps">Next Steps</h3>
<p>Hopefully this has given you a sense for what the Document Validation functionality offers and started you thinking about how it could be applied to your application and database. I'd encourage you to read up more on the topic and these are some great resources:</p>
<ul>
<li>If you haven’t already read the <a href="https://www.mongodb.com/blog/post/document-validation-part-1-adding-just-the-right-amount-of-control-over-your-documents “Document Validation - Part 1: Adding Just the Right Amount of Control Over Your Documents"">first blog in this series</a> then you should read it now</li>
<li><a href="https://docs.mongodb.org/master/release-notes/3.2/#document-validation" title="MongoDB 3.2 documentation for Document Validation">MongoDB 3.2 documentation for Document Validation</a></li>
<li>The best way to really get a feel for the functionality is to try it out for yourself:<a href="https://www.mongodb.org/downloads#development" title="Download MongoDB 3.2">Download MongoDB 3.2</a></li>
<li>Feedback is welcomed and we’d encourage you to join the <a href="https://www.mongodb.com/blog/post/announcing-the-mongodb-3-2-bug-hunt">MongoDB 3.2 bug hunt</a></li>
<li><a href="http://www.eliothorowitz.com/blog/2015/09/11/document-validation-and-what-dynamic-schema-means/" title="Document Validation and What Dynamic Schema Means">Document Validation and What Dynamic Schema Means</a> – Eliot Horowitz. This blog post adds context to why this functionality is being introduced now.</li>
<li><a href="https://www.mongodb.com/presentations/data-management-3-bulletproof-data-management" title="Bulletproof Data Management">Bulletproof Data Management</a> – Buzz Moschetti. Great presentation on how to look after your data - including in earlier versions of MongoDB</li>
<li>Register for our upcoming webinar covering <a href="https://www.mongodb.com/webinar/whats-new-in-mongodb-3-2">what's new in MongoDB 3.2</a></li>
</ul>
<p></p><hr>
Watch Andrew's webinar covering document validation in 3.2.<p></p>
<p></p><center><a class="btn btn-primary" href="https://www.mongodb.com/presentations/webinar-document-validation-in-mongodb-3-2?jmp=blog" target="_BLANK">Document Validation in MongoDB 3.2</a></center><p></p>
<hr>

<p><em>About the Author - Andrew Morgan</em></p>
<p><em>Andrew is a Principal Product Marketing Manager working for MongoDB. He joined at the start of this summer from Oracle where he’d spent 6+ years in product management, focussed on High Availability.</em></p>