Document Validation for Polymorphic Collections
Rate this article
In data modeling design reviews with customers, I often propose a schema where different documents in the same collection contain different types of data. This makes it efficient to fetch related documents in a single, indexed query. MongoDB's flexible data is great for optimizing workloads in this way, but people can be concerned about losing control of what applications write to these collections.
Customers are often concerned about ensuring that only correctly formatted documents make it into a collection, and so I explain MongoDB's schema validation feature. The question then comes: "How does that work with a polymorphic/single-collection schema?" This post is intended to answer that question — and it's simpler than you might think.
The application I'm working on manages customer and account details. There's a many-to-many relationship between customers and accounts. The app needs to be able to efficiently query customer data based on the customer id, and account data based on either the id of its customer or the account id.
Here's an example of customer and account documents where my wife and I share a checking account but each have our own savings account:
As an aside, these are the indexes I added to make those frequent queries I referred to more efficient:
Schema validation lets you create validation rules for your fields, such as allowed data types and value ranges.
MongoDB uses a flexible schema model, which means that documents in a collection do not need to have the same fields or data types by default. Once you've established an application schema, you can use schema validation to ensure there are no unintended schema changes or improper data types.
The validation rules are pretty simple to set up, and tools like Hackolade can make it simpler still — even reverse-engineering your existing documents.
It's simple to imagine setting up a JSON schema validation rule for a collection where all documents share the same attributes and types. But what about polymorphic collections? Even in polymorphic collections, there is structure to the documents. Fortunately, the syntax for setting up the validation rules allows for the required optionality.
I have two different types of documents that I want to store in my
Accounts
collection — customer
and account
. I included a docType
attribute in each document to identify which type of entity it represents.I start by creating a JSON schema definition for each type of document:
Those definitions define what attributes should be in the document and what types they should take. Note that fields can be optional — such as
name.middle
in the customer
schema.It's then a simple matter of using the
oneOf
JSON schema operator to allow documents that match either of the two schema:I wanted to go a stage further and add some extra, semantic validations:
- For
customer
documents, thecustomerSince
value can't be any earlier than the current time. - For
account
documents, thedateOpened
value can't be any earlier than the current time. - For savings accounts, the
balance
can't fall below zero.
This document represents these checks:
I updated the collection validation rules to include these new checks:
If you want to recreate this in your own MongoDB database, then just paste this into your MongoDB playground in VS Code:
I hope that this short article has shown how easy it is to use schema validations with MongoDB's polymorphic collections and single-collection design pattern.
I didn't go into much detail about why I chose the data model used in this example. If you want to know more (and you should!), then here are some great resources on data modeling with MongoDB:
- Daniel Coupal and Ken Alger’s excellent series of blog posts on MongoDB schema patterns
- Daniel Coupal and Lauren Schaefer’s equally excellent series of blog posts on MongoDB anti-patterns
- MongoDB University Course, M320 - MongoDB Data Modeling
Related
Tutorial
Create a Data Pipeline for MongoDB Change Stream Using Pub/Sub BigQuery Subscription
Jul 11, 2023 | 5 min read
Podcast
Making Diabetes Data More Accessible and Meaningful with Tidepool and MongoDB
May 16, 2022 | 15 min