I’m creating an application which allows users define a schema for their documents and then create documents that will be validated against the schema. Document schemas can define “relations” between each other which means that when I query the documents I need to get related documents as well. Schemas also allow to set certain document properties as ‘sortable’ which means I’d need to create indexes for documents that have that schema.
All the validation is done on the application layer. I am however facing a dilemma of how to “model” my storage layer.
The approach I’m taking at the moment is inserting all documents into one collection no matter what schema they belong to. Which means that I’ll have a bunch of partial indexes that only index the documents where specific fields exist. I also have a relations collection with documents that have ‘from’ and ‘to’ ids (something akin to join tables).
It works well but I keep second-guessing this approach. Based on what I see in the wild the most common approach is to have a collection per schema. I’m not against it but it certainly makes for a more complex solution as I’d need to create more collections for the “join” tables and track more things overall.
My question is — am I missing something crucial in a the single collection approach that might render it as a bad one?