JSON Schema Validation - Locking down your model the smart way

If you've read through the Building with Patterns series, you've likely seen the flexibility of MongoDB's document data model. This provides many advantages when it comes to building applications quickly as we aren't locked into a rigid data structure like we are in a legacy, tabular database. Once your schema design is determined and settled upon, it is often useful to "lock" it into place. In MongoDB, we can use JSON Schema validation to accomplish this task. Since MongoDB 3.6, we have supported schema validation based on the JSON Schemas Draft Specification.

This ability to lock down the document model with a strictly designed schema means you can, for example, introduce concrete milestones in the evolution of your data model which you can test against. One potential scenario would be that after an application has gone through the development cycle and the data structure has become more rigid. At this point, defining a structure for the data may be desirable to ensure there are no unintended changes to the schema, or unexpected data being put into a specific field. For example, someone storing an image in a password field is not a desirable experience.

Schema validation can be accomplished in MongoDB both during the creation of a collection and on existing documents. The validation process occurs during document updates and inserts. Therefore, when applying rules to an existing collection the rules will undergo validation only when they are modified. The syntax for implementing validation is similar in either case:

New Collection

db.createCollection("recipes",
    validator: { $jsonSchema: {
         <<Validation Rules>>
        }
    }
)

Existing Collection

db.runCommand( {
    collMod: "recipes",
    validator: { $jsonSchema: {
         <<Validation Rules>>
        }
    }
} )

Inside the validator section of the document, we can explicitly state the fields and field types the document must have. We can define the values that a field may accept, a minimum and/or a maximum number of items a field may contain, and if we are allowed to add additional fields to the document. An example of some of these features will likely help clarify these a bit more.

JSON Schema Validation

For the example schema, let's think about a collection of cooking recipes. The basic information we need in each recipe will be the name, the number of servings, and a list of ingredients. We'll make those required. We'll allow for an optional _cooking_method _field should we want to be able to find all recipes for items that are sauteed, for example. We'll create a new collection and set up our validation rules.

db.createCollection("recipes", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "servings", "ingredients"],
      additionalProperties: false,
      properties: {
        _id: {},
        name: {
          bsonType: "string",
          description: "'name' is required and is a string"
        },
        servings: {
          bsonType: ["int", "double"],
          minimum: 0,
          description:
            "'servings' is required and must be an integer with a minimum of zero."
        },
        cooking_method: {
          enum: [
            "broil",
            "grill",
            "roast",
            "bake",
            "saute",
            "pan-fry",
            "deep-fry",
            "poach",
            "simmer",
            "boil",
            "steam",
            "braise",
            "stew"
          ],
          description:
            "'cooking_method' is optional but, if used, must be one of the listed options."
        },
        ingredients: {
          bsonType: ["array"],
          minItems: 1,
          maxItems: 50,
          items: {
            bsonType: ["object"],
            required: ["quantity", "measure", "ingredient"],
            additionalProperties: false,
            description: "'ingredients' must contain the stated fields.",
            properties: {
              quantity: {
                bsonType: ["int", "double", "decimal"],
                description:
                  "'quantity' is required and is of double or decimal type"
              },
              measure: {
                enum: ["tsp", "Tbsp", "cup", "ounce", "pound", "each"],
                description:
                  "'measure' is required and can only be one of the given enum values"
              },
              ingredient: {
                bsonType: "string",
                description: "'ingredient' is required and is a string"
              },
              format: {
                bsonType: "string",
                description:
                  "'format' is an optional field of type string, e.g. chopped or diced"
              }
            }
          }
        }
      }
    }
  }
});

If we look at what's been defined here, we have our required fields, name, servings, and ingredients. The additionalProperties: false rule prevents other fields from being added beyond those three fields, or fields that we explicitly state in our validation rule. We've allowed an _id field in our document which is important. If we did not specify this in the schema no document would be inserted as _id is autogenerated and no document can exist without in the database as it is our primary key.

The name field has been set to a required string valued field. The servings field is required and must be an integer or double. Next here is an optional cooking_method field. If it is included in the document, only values listed are acceptable.

The ingredients field has some additional complexity to the validation process. It has been defined as an array of items that each have a required quantity, measure, and ingredient. There is an optional format field as well to handle descriptions such as whole, diced, chopped, etc. The accepted data types for the various fields have been set for each ingredient during the schema validation process. Double or decimal for quantity, one of the predefined values for measure, and string values for ingredient and format.

With the validation rules in place, let's try to insert some sample documents into the collection and see what would happen:

Document 1

db.recipes.insertOne({
  name: "Chocolate Sponge Cake Filling",
  servings: 4,
  ingredients: [
    {
      quantity: 7,
      measure: "ounce",
      ingredient: "bittersweet chocolate",
      format: "chopped"
    },
    { quantity: 2, measure: "cup", ingredient: "heavy cream" }
  ]
});

This insert works as all of the required fields in the validation requirements are present and in the correct format.

Document 2

db.recipes.insertOne({
  name: "Chocolate Sponge Cake Filling",
  servings: 4,
  ingredients: [
    {
      quantity: 7,
      measure: "ounce",
      ingredient_name: "bittersweet chocolate",
      format: "chopped"
    },
    { quantity: 2, measure: "cup", ingredient: "heavy cream" }
  ],
  directions:
    "Boil cream and pour over chocolate. Stir until chocolate is melted."
});

This insert would fail with a WriteError: Document failed validation error due to the additional directions field in the document.

There are other rules that can be applied to the document as well and schema validation can be carried out on sub-documents like we've seen with ingredients but also on arrays. Additionally, schema dependencies can be set to move application logic natively into the database. Validation strictness can also be set to allow for outright rejection of the document write operation or just a warning.

Conclusion

JSON schema validation can be a powerful tool to maintain your data structure. This provides even greater power with the document model in MongoDB. We have the ability to rapidly try out different schema designs in an application and then, once the model has been solidified, enforce some standards. We get to take advantage of both the flexibility of the document model along with data validation.