Database design for tabular data (user defined columns with potentially lof of rows of data to maintain)

Basically what i’m trying to design is spreadsheets of data with MongoDB, or lets call it tabular data, where user can define arbitrary number of columns of rows for a table.

Names of the columns itself, their data types that they will hold and metadata like is column unique, default value for it, order in which column should appear, and similar, is not known in advance.
Table itself belongs to organisation/company entity (lets call it tenants)

Lets imagine for the sake of simplicity, we have user defined table 2 x 2, that looks something like following. Table name is “Parking spots” and table description is “List of our parking spots”.


Parking type Number od spots
garage 10
public 100

Over the time i could expect not that much created columns, but i can expect thousand(s) of rows. Lets say that above table from 2x2 could become 20 * 2000. Further more i could expect a lot of updates when it comes to updating specific cell of a table.

Being not that experienced with mongodb, im looking for some optimal solution how to design this having in mind query performances supporting regular operation what end user can due with tables (like updating individual cells, sorting rows based on some columns, etc)

My first guess would be to have single collection “tables” with embedded array of “columns” which itself would have embedded “cells” array. In above example of user defined table with parking data, with everything embedded, data in database would look like following: http://json-parser.com/96f5929e/1

My guess for embedding is based on that, that “cells” and “columns” can not exists on their own, and there is no use case for end-user for fetching cell without column, or fetching column without other table data.
But again, in that approach, i would end up with possibility of having thousands and thousands of embedded documents.
Still i would maintain objectId for each cell and each column, for being able to faster find/update individual cell. (Not even sure is it possible having unique _id per each nested document within the parent)

Another approach would be to use referencing instead of embedding, to perhaps have separate collection for columns and separate collection for cells. Or to use embedding for table data and columns, while having separated collection for individual cells. But somehow to me, with embedding everything seems like better educated guess.

Hey @Srdjan_Cengic,

Welcome to the MongoDB Community Forums! :leaves:

Your initial approach of embedding columns and cells within a single collection of “tables” is a reasonable one, given that columns and cells are tightly coupled to the table and have no meaning outside of it. However, as you pointed out correctly, this approach could lead to performance issues as the number of cells and columns grows. Also, MongoDB has a limit of 16MB on the BSON document size.

Yes, you can have different ids in each embedded document for referencing easily.

Additionally, I would like to point out one thing about data modeling in MongoDB. A general thumb rule to follow while schema designing in MongoDB is things that are queried together should stay together. Thus, it may be beneficial to work from the required queries first and let the schema design follow the query pattern.
Ultimately, the best approach depends on your specific use case and performance requirements. You may want to consider testing out different approaches and measuring their performance to determine which one works best for your needs. You can use mgeneratejs to create sample documents quickly in any number, so the design can be tested easily.

Additionally, since you’re new to MongoDB, I’m linking some more useful resources that you can explore:
Data Modelling Course
MongoDB Indexing

Please let us know if you have any additional questions. Feel free to reach out for anything else as well.

Regards,
Satyam

@Satyam thank you very much for your answer. Still im trying to find some approach to start with, that would, at least, be a solid candidate to start with.

I gave up from my solution, described in my previous answer, cause according to some mongodb articles, one should avoid unbounded arrays. That would definitelly happen in my solution with “cells” array.

Would be great if you could share any kind of approach you think it would be good to start with for described issue.

Besides regular operation that user can do with normal spreadsheet/table, very important requirement would be to sort data based on whatever columns, like in example above, if you sort by “numberOfSpots” rows would change order.

I was reading a lot about “attribute” pattern with mongodb, somehow im trying now two have 2 different collections.

One collection “tables” to hold meta data about table columns such as:

tables: [{
  _id: …,
  tenantId: 1,
  name:  “Parking spots”,
  description:  “List of our parking spots”.
  columns: [{
    _id: …,
    columnName: “parkingType”,
    order: 1
  }, {
     _id: …,
    columnName: “numberOfSpots”,
    order: 2
  }],
  …
]

And then to have separated collection “table_cells” to hold value of individual table cells.

Maybe something like:

table_cells: [{
    _id: …,
    columnId: first column _id in “tables” collection,
    rowIndex: 1,
    value: “garage”
  }, {
    _id: …,
    columnId: first column _id in “tables” collection,
    rowIndex: 2,
    value: “public”
  }, {
    _id: …,
    columnId: second column _id in “tables” collection,
    rowIndex: 1,
    value: 10
  }, {
    _id: …,
    columnId: second column _id in  “tables” collection,
    rowIndex: 2,
    value: 100
  }
]

With this at least i wouldn’t have unbounded arrays like in first solution. Now i need to think that this table can grow to like 20(columns)x 2000(rows) and to think how exactly to add indexes on design like this.

Third approach would be to somehow maintain “table_rows” collection instead of “table_cells”, where each document in “table_rows” would represent one row of the table. But i dont know how then to create schema for that cause meta data about columns (including column name) is created by user.

I would really really appreciate any opinion or solution you could advise.