Stock the data or calculate 'on the fly', how to choose?

Christophe_Conceicao · June 22, 2020, 1:09pm

Hello everyone !

In our project we face always the same question, shall we just calculate the derived data ‘on the fly’ or shall we stock and maintain derived data (with callbacks and scripts)?

Let’s say I have the collections “Projects” and “Users”. I want to know all my projects so I can
1 Create a field userId in Projects and look for all the projects with this userId (having indexed) every time.
Or
2 Create an array myProjectIds in Users plus the userId in Projects. myProjects should be maintained with callbacks and scripts turning each x time.
This time the first option seems easier and enough

But what if we want to look for all the projects I belong to in any way to a project and I have too many fields to query in each document (let’s say I have an array teamMembers with and object in each)?

Of course we can measure the time to answer of the differente queries, but is there any rule to know in advance if it’s worth it to take the second approach?

Thanks for you answers!

Lauren_Schaefer · September 25, 2020, 11:47am

Hi @Christophe_Conceicao - welcome to the community!

I’m curious what direction you went and how that is working out for you. Have you learned anything along the way?

The rule of them when modeling data in MongoDB is data that is accessed together should be stored together. The way you model your date really depends on your use case, and how the application will need to update and retrieve the data.

I’m thinking the Extended Reference Pattern could be a good option for your use case. This would allow you to store relevant information in the Projects collection as well as the Users collection.

A few resources that can help you on your data modeling journey:
Blog series on schema design patterns:

Blog series on schema design anti-patterns
https://www.mongodb.com/article/schema-design-anti-pattern-summary

Free MongoDB University Course on Data Modeling