Document Validation – Part 2: Putting it all Together, a Tutorial

Andrew Morgan

Technical
Facebook ShareLinkedin ShareReddit ShareTwitter Share

Introduction

This is the second and final post in a series looking at document validation in MongoDB 3.2; if you haven’t already read the first blog in this series then you should read it now.

The intent of this post is to step you through exactly how document validation can be introduced into an existing production deployment in such a way that there is no impact to your users. It covers:

  • Setting up some test data (not needed for a real deployment)
  • Using MongoDB Compass and the mongo shell to reverse engineer the de facto data model and identify anomalies in the existing documents
  • Defining the appropriate document validation rules
  • Preventing new documents being added which don’t follow the new rules
  • Bring existing documents “up to spec” against the new rules

Figure 1: Aligning document validation with application lifecycle

Tutorial

This section looks at taking an existing, deployed database which currently has no document validations defined. It steps through understanding what the current document structure looks like; deciding on what rules to add and then rolling out those new rules.

As a pre-step add some data to the database (obviously, this isn't needed if working with your real deployment).

use clusterdb;
db.dropDatabase();
use clusterdb();
db.inventory.insert({ "_id" : 1, "sku" : "abc", 
    "description" : "product 1", "instock" : 120 });
db.inventory.insert({ "_id" : 2, "sku" : "def", 
    "description" : "product 2", "instock" : 80 });
db.inventory.insert({ "_id" : 3, "sku" : "ijk", 
    "description" : "product 3", "instock" : 60 });
db.inventory.insert({ "_id" : 4, "sku" : "jkl", 
    "description" : "product 4", "instock" : 70 });
db.inventory.insert({ "_id" : 5, "sku" : null, 
    "description" : "Incomplete" });
db.inventory.insert({ "_id" : 6 });

for (i=1000; i<2000; i++) {
  db.orders.insert({
    _id: i,
    item: "abc", 
    price: i % 50,
    quantity: i % 5
  });
};

for (i=2000; i<3000; i++) {
  db.orders.insert({
    _id: i,
    item: "jkl", 
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3000; i<3200; i++) {
  db.orders.insert({
    _id: i,
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3200; i<3500; i++) {
  db.orders.insert({
    _id: i,
    item: null,
    price: i % 30,
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=3500; i<4000; i++) {
  db.orders.insert({
    _id: i,
    item: "abc",
    price: "free",
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

for (i=4000; i<4250; i++) {
  db.orders.insert({
    _id: i,
    item: "abc",
    price: "if you have to ask....",
    quantity: Math.floor(10 * Math.random()) + 1
  });
};

The easiest way to start understanding the de facto schema for your database is to use MongoDB Compass. Simply connect Compass to your mongod (or mongos if you're using sharding) and select the database/collection you'd like to look into. To see MongoDB Compass in action – view this demo video.

As shown in Figure 2, there are typically four keys in each document from the clusterdb.orders table:

  • _id is always present and is a number
  • item is normally present and is a string (either "abc" or "jkl") but is occasionally null or missing altogether (undefined)
  • price is always present and is in most cases a number (the histogram shows how the values are distributed between 0 and 49) but in some cases it's a string
  • quantity is always present and is a number

Figure 2: Viewing the Document Schema using MongoDB Compass

For this tutorial, we'll focus on the price. By clicking on the string label, Compass will show us more information about the string content for price - this is shown in Figure 3.

Figure 3: Drilling Down into string Values

Compass shows us that:

  • For those instances of price which are strings, the common values are "free" and "if you have to ask....".
  • If you click on one of those values, a query expression is formed and clicking "Apply" runs that query and now Compass will show you information only for that subset of documents. For example, where price == "if you have to ask...." (see Figure 4).
  • By selecting multiple attributes, you can build up fairly complex queries.
  • The query you build visually is printed at the top so you can easily copy/paste into other contexts like the shell.

Figure 4: Formulating Search Expressions with MongoDB Compass

If applications are to work with the price from these documents then it would be simpler it it was always set to a numerical value, and so this is something that should be fixed.

Before cleaning up the existing documents, the application should be updated to ensure numerical values are stored in the price field. We can do this by adding a new validation rule to the collection. We want this rule to:

  • Allow changes to existing invalid documents
  • Prevent inserts of new documents which violate validation rules
  • Set up a very simple document validation rule that checks that price exists and contains a double – see the enumeration of MongoDB BSON types

These steps should be run from the mongo shell:

db.orders.runCommand("collMod", 
                   {validationLevel: "moderate", 
                    validationAction: "error"});

db.runCommand({collMod: "orders", 
               validator: {
                  price: {$exists: true},
                  price: {$type: 1}
                }
              });
```

The validation rules for this collection can now be checked:
```javascript
db.getCollectionInfos({name:"orders"})
[
  {
    "name": "orders",
    "options": {
      "validator": {
        "price": {
          "$type": 1
        }
      },
      "validationLevel": "moderate",
      "validationAction": "error"
    }
  }
]
Now that this has been set up, it's possible to check that we can't add a new document that breaks the rule:
db.orders.insert({
    "_id": 6666, 
    "item": "jkl", 
    "price": "rogue",
    "quantity": 1 });

Document failed validation
WriteResult({
  "nInserted": 0,
  "writeError": {
    "code": 121,
    "errmsg": "Document failed validation"
  }
})

But it's OK to modify an existing document that does break the rule:

db.orders.findOne({price: {$type: 2}});

{
  "_id": 3500,
  "item": "abc",
  "price": "free",
  "quantity": 5
}

> db.orders.update(
    {_id: 3500},
    {$set: {quantity: 12}});

Updated 1 existing record(s) in 5ms
WriteResult({
  "nMatched": 1,
  "nUpserted": 0,
  "nModified": 1
})

Now that the application is no longer able to store new documents that break the new rule, it's time to clean up the "legacy" documents. At this point, it's important to point out that Compass works on a random sample of the documents in a collection (this is what allows it to be so quick). To make sure that we're fixing all of the documents, we check from the mongo shell. As the following commands could consume significant resources, it may make sense to run them on a secondary):

secondary> db.orders.aggregate([
    {$match: {
      price: {$type: 2}}},
    {$group: {
      _id: "$price", 
      count: {$sum:1}}}
  ])

{ "_id" : "if you have to ask....", "count" : 250 }
{ "_id" : "free", "count" : 500 }

The number of exceptions isn't too high and so it is safe to go ahead and fix up the data without consuming too many resources:

db.orders.update(
    {price:"free"},
    {$set: {price: 0}},
    {multi: true});

db.orders.update(
    {price:"if you have to ask...."},
    {$set: {price: 1000000}},
    {multi: true});

At this point it's now safe to enter the strict mode where any inserts or updates will cause an error if the document being stored doesn't follow the rules:

db.orders.runCommand("collMod", 
                   {validationLevel: "strict", 
                    validationAction: "error"});

Next Steps

Hopefully this has given you a sense for what the Document Validation functionality offers and started you thinking about how it could be applied to your application and database. I'd encourage you to read up more on the topic and these are some great resources:


Watch Andrew's webinar covering document validation in 3.2.

Document Validation in MongoDB 3.2


About the Author - Andrew Morgan

Andrew is a Principal Product Marketing Manager working for MongoDB. He joined at the start of this summer from Oracle where he’d spent 6+ years in product management, focussed on High Availability.