GIANT Stories at MongoDB

Building with Patterns: A Summary

As we wrap up the Building with Patterns series, it’s a good opportunity to recap the problems the patterns that have been covered solve and highlight some of the benefits and trade-offs each pattern has. The most frequent question that is asked about schema design patterns, is “I’m designing an application to do X, how do I model the data?” As we hope you have discovered over the course of this blog series, there are a lot of things to take into consideration to answer that.

Building with Patterns: The Schema Versioning Pattern

It has been said that the only thing constant in life is change. This holds true to database schemas as well. Information we once thought wouldn’t be needed, we now want to capture. Or new services become available and need to be included in a database record. Regardless of the reason behind the change, after a while, we inevitably need to make changes to the underlying schema design in our application. While this often poses challenges, and perhaps at least a few headaches in a legacy tabular database system, in MongoDB we can use the Schema Versioning pattern to make the changes easier.

Building with Patterns: The Document Versioning Pattern

Databases, such as MongoDB, are very good at querying lots of data and updating that data frequently. In most cases, however, we are only performing queries on the latest state of the data. What about situations in which we need to query previous states of the data? What if we need to have some functionality of version control of our documents? This is where we can use the Document Versioning Pattern.

Building with Patterns: The Preallocation Pattern

One of the great things about MongoDB is the document data model. It provides for a lot of flexibility not only in schema design but in the development cycle as well. Not knowing what fields will be required down the road is easily handled with MongoDB documents. However, there are times when the structure is known and being able to fill or grow the structure makes the design much simpler. This is where we can use the Preallocation Pattern.

JSON Schema Validation - Locking down your model the smart way

If you've read through the Building with Patterns series, you've likely seen the flexibility of MongoDB's document data model. This provides many advantages when it comes to building applications quickly as we aren't locked into a rigid data structure like we are in a legacy, tabular database. Once your schema design is determined and settled upon, it is often useful to "lock" it into place. In MongoDB, we can use JSON Schema validation to accomplish this task. Since MongoDB 3.6, we have supported schema validation based on the JSON Schemas Draft Specification.

This ability to lock down the document model with a strictly designed schema means you can, for example, introduce concrete milestones in the evolution of your data model which you can test against. One potential scenario would be that after an application has gone through the development cycle and the data structure has become more rigid. At this point, defining a structure for the data may be desirable to ensure there are no unintended changes to the schema, or unexpected data being put into a specific field. For example, someone storing an image in a password field is not a desirable experience.

Schema validation can be accomplished in MongoDB both during the creation of a collection and on existing documents. The validation process occurs during document updates and inserts. Therefore, when applying rules to an existing collection the rules will undergo validation only when they are modified. The syntax for implementing validation is similar in either case:

New Collection

db.createCollection("recipes",
    validator: { $jsonSchema: {
         <<Validation Rules>>
        }
    }
)

Existing Collection

db.runCommand( {
    collMod: "recipes",
    validator: { $jsonSchema: {
         <<Validation Rules>>
        }
    }
} )

Inside the validator section of the document, we can explicitly state the fields and field types the document must have. We can define the values that a field may accept, a minimum and/or a maximum number of items a field may contain, and if we are allowed to add additional fields to the document. An example of some of these features will likely help clarify these a bit more.

JSON Schema Validation

For the example schema, let's think about a collection of cooking recipes. The basic information we need in each recipe will be the name, the number of servings, and a list of ingredients. We'll make those required. We'll allow for an optional _cooking_method _field should we want to be able to find all recipes for items that are sauteed, for example. We'll create a new collection and set up our validation rules.

db.createCollection("recipes", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "servings", "ingredients"],
      additionalProperties: false,
      properties: {
        _id: {},
        name: {
          bsonType: "string",
          description: "'name' is required and is a string"
        },
        servings: {
          bsonType: ["int", "double"],
          minimum: 0,
          description:
            "'servings' is required and must be an integer with a minimum of zero."
        },
        cooking_method: {
          enum: [
            "broil",
            "grill",
            "roast",
            "bake",
            "saute",
            "pan-fry",
            "deep-fry",
            "poach",
            "simmer",
            "boil",
            "steam",
            "braise",
            "stew"
          ],
          description:
            "'cooking_method' is optional but, if used, must be one of the listed options."
        },
        ingredients: {
          bsonType: ["array"],
          minItems: 1,
          maxItems: 50,
          items: {
            bsonType: ["object"],
            required: ["quantity", "measure", "ingredient"],
            additionalProperties: false,
            description: "'ingredients' must contain the stated fields.",
            properties: {
              quantity: {
                bsonType: ["int", "double", "decimal"],
                description:
                  "'quantity' is required and is of double or decimal type"
              },
              measure: {
                enum: ["tsp", "Tbsp", "cup", "ounce", "pound", "each"],
                description:
                  "'measure' is required and can only be one of the given enum values"
              },
              ingredient: {
                bsonType: "string",
                description: "'ingredient' is required and is a string"
              },
              format: {
                bsonType: "string",
                description:
                  "'format' is an optional field of type string, e.g. chopped or diced"
              }
            }
          }
        }
      }
    }
  }
});

If we look at what's been defined here, we have our required fields, name, servings, and ingredients. The additionalProperties: false rule prevents other fields from being added beyond those three fields, or fields that we explicitly state in our validation rule. We've allowed an _id field in our document which is important. If we did not specify this in the schema no document would be inserted as _id is autogenerated and no document can exist without in the database as it is our primary key.

The name field has been set to a required string valued field. The servings field is required and must be an integer or double. Next here is an optional cooking_method field. If it is included in the document, only values listed are acceptable.

The ingredients field has some additional complexity to the validation process. It has been defined as an array of items that each have a required quantity, measure, and ingredient. There is an optional format field as well to handle descriptions such as whole, diced, chopped, etc. The accepted data types for the various fields have been set for each ingredient during the schema validation process. Double or decimal for quantity, one of the predefined values for measure, and string values for ingredient and format.

With the validation rules in place, let's try to insert some sample documents into the collection and see what would happen:

Document 1

db.recipes.insertOne({
  name: "Chocolate Sponge Cake Filling",
  servings: 4,
  ingredients: [
    {
      quantity: 7,
      measure: "ounce",
      ingredient: "bittersweet chocolate",
      format: "chopped"
    },
    { quantity: 2, measure: "cup", ingredient: "heavy cream" }
  ]
});

This insert works as all of the required fields in the validation requirements are present and in the correct format.

Document 2

db.recipes.insertOne({
  name: "Chocolate Sponge Cake Filling",
  servings: 4,
  ingredients: [
    {
      quantity: 7,
      measure: "ounce",
      ingredient_name: "bittersweet chocolate",
      format: "chopped"
    },
    { quantity: 2, measure: "cup", ingredient: "heavy cream" }
  ],
  directions:
    "Boil cream and pour over chocolate. Stir until chocolate is melted."
});

This insert would fail with a WriteError: Document failed validation error due to the additional directions field in the document.

There are other rules that can be applied to the document as well and schema validation can be carried out on sub-documents like we've seen with ingredients but also on arrays. Additionally, schema dependencies can be set to move application logic natively into the database. Validation strictness can also be set to allow for outright rejection of the document write operation or just a warning.

Conclusion

JSON schema validation can be a powerful tool to maintain your data structure. This provides even greater power with the document model in MongoDB. We have the ability to rapidly try out different schema designs in an application and then, once the model has been solidified, enforce some standards. We get to take advantage of both the flexibility of the document model along with data validation.

Building with Patterns: The Tree Pattern

Many of the schema design patterns we've covered so far have stressed that saving time on JOIN operations is a benefit. Data that's accessed together should be stored together and some data duplication is okay. A schema design pattern like Extended Reference is a good example. However, what if the data to be joined is hierarchical? For example, you would like to identify the reporting chain from an employee to the CEO? MongoDB provides the $graphLookup operator to navigate the data as graphs, and that could be one solution. However, if you need to do a lot of queries of this hierarchical data structure, you may want to apply the same rule of storing together data that is accessed together. This is where we can use the Tree Pattern.

Building with Patterns: The Approximation Pattern

Imagine a fairly decent sized city of approximately 39,000 people. The exact number is pretty fluid as people move in and out of the city, babies are born, and people die. We could spend our days trying to get an exact number of residents each day. But most of the time that 39,000 number is "good enough." Similarly, in many applications we develop, knowing a "good enough" number is sufficient. If a "good enough" number is good enough then this is a great opportunity to put the Approximation Pattern to work in your schema design.

Building with Patterns: The Extended Reference Pattern

Throughout this Building With Patterns series, I hope you've discovered that a driving force in what your schema should look like, is what the data access patterns for that data are. If we have a number of similar fields, the Attribute Pattern may be a great choice. Does accommodating access to a small portion of our data vastly alter our application? Perhaps the Outlier Pattern is something to consider. Some patterns, such as the Subset Pattern, reference additional collections and rely on JOIN operations to bring every piece of data back together. What about instances when there are lots of JOIN operations needed to bring together frequently accessed data? This is where we can use the Extended Reference pattern.

Building with Patterns: The Subset Pattern

Some years ago, the first PCs had a whopping 256KB of RAM and dual 5.25" floppy drives. No hard drives as they were incredibly expensive at the time. These limitations resulted in having to physically swap floppy disks due to a lack of memory when working with large (for the time) amounts of data. If only there was a way back then to only bring into memory the data I frequently used, as in a subset of the overall data.

Modern applications aren't immune from exhausting resources. MongoDB keeps frequently accessed data, referred to as the working set, in RAM. When the working set of data and indexes grows beyond the physical RAM allotted, performance is reduced as disk accesses starts to occur and data rolls out of RAM.

Building with Patterns: The Computed Pattern

We've looked at various ways of optimally storing data in the Building with Patterns series. Now, we're going to look at a different aspect of schema design. Just storing data and having it available isn't, typically, all that useful. The usefulness of data becomes much more apparent when we can compute values from it. What's the total sales revenue of the latest Amazon Alexa? How many viewers watched the latest blockbuster movie? These types of questions can be answered from data stored in a database but must be computed.