Basically what i’m trying to design is spreadsheets of data with MongoDB, or lets call it tabular data, where user can define arbitrary number of columns of rows for a table.
Names of the columns itself, their data types that they will hold and metadata like is column unique, default value for it, order in which column should appear, and similar, is not known in advance.
Table itself belongs to organisation/company entity (lets call it tenants)
Lets imagine for the sake of simplicity, we have user defined table 2 x 2, that looks something like following. Table name is “Parking spots” and table description is “List of our parking spots”.
|Parking type||Number od spots|
Over the time i could expect not that much created columns, but i can expect thousand(s) of rows. Lets say that above table from 2x2 could become 20 * 2000. Further more i could expect a lot of updates when it comes to updating specific cell of a table.
Being not that experienced with mongodb, im looking for some optimal solution how to design this having in mind query performances supporting regular operation what end user can due with tables (like updating individual cells, sorting rows based on some columns, etc)
My first guess would be to have single collection “tables” with embedded array of “columns” which itself would have embedded “cells” array. In above example of user defined table with parking data, with everything embedded, data in database would look like following: http://json-parser.com/96f5929e/1
My guess for embedding is based on that, that “cells” and “columns” can not exists on their own, and there is no use case for end-user for fetching cell without column, or fetching column without other table data.
But again, in that approach, i would end up with possibility of having thousands and thousands of embedded documents.
Still i would maintain objectId for each cell and each column, for being able to faster find/update individual cell. (Not even sure is it possible having unique _id per each nested document within the parent)
Another approach would be to use referencing instead of embedding, to perhaps have separate collection for columns and separate collection for cells. Or to use embedding for table data and columns, while having separated collection for individual cells. But somehow to me, with embedding everything seems like better educated guess.