Doubts regarding Extended Reference Pattern with variable data

Hi all,

I’m new to data modelling in MongoDB. I have a query regarding the Extended Reference Pattern (ERP).

I’m building a task manager application. As part of modelling, I’ve identified two entities: Project and Task, which have a one-to-many relationship (i.e., one project has many tasks).

A Task document features many fields (e.g., cost/time estimate, title, description, comments, activity, …).

In the project board view, I visualise tasks as cards. The card previews show only a subset of the fields contained in a Task (i.e., title and cost/time estimates). The entire Task info is loaded only when the user clicks on a card.

To get the Task fields I need for the project board, instead of performing a join on Project and Task, I thought I would use ERP to speed up retrieval.

My idea is to embed the subset of the Task fields directly in Project, as follows:

Project:
{
_id: x;
title: bla;
…;
tasks: [
{task_id: x1; title: y; estimate: {cost: 10; time: 10} };
{task_id: x2; title: y; estimate: {cost: 10; time: 10} }
]
}

I have a couple of doubts regarding this design.

First, the estimate of a task may change sometimes. I’ve learnt that ERP is suggested in case the embedded data is stable. On the one hand, I’m tempted to embed the task data, because the project board query is the most important operation of the app and would like to optimise it. On the other, I feel I’ll need numerous writes to keep the duplicated information up-to-date over time.

Second, from the Data Modelling course on MongoDB University I seem to remember that when using ERP the embedding should happen on the many side. In my case, I embed data on the one side.

I’d really appreciate some expert advice on whether it makes sense to use ERP in this instance. Also, I’d like to ask if there is an actionable heuristic to decide when data is too “unstable” to be embedded using ERP?

Thanks a lot.
Valerio

Hello @Valerio_Velardo, welcome to the forum!

…because the project board query is the most important operation of the app and would like to optimise it.

This and the relationship between the Project and the Task entities makes me think that - you could just use one collection called as projects with all the tasks embedded in it. For example,

{
  _id: 123,
  title: 'The data design',
  ...,
  tasks: [
      { task_id: 1, title: 'Entities and relationships', estimate: {cost: 10, time: 9}, description: ..., otherDetails: {...} },
      { task_id: 2, title: 'CRUD operations', estimate: {cost: 20, time: 15}, description: ..., otherDetails: {...}  },
       ...
  ]
}

In the project board view - you retrieve the project with all tasks, and show what is requred in the initial preview. The remaining data for the project tasks can be stored in memory (e.g., an array). When the user clicks a card, the specific card detail can be fetched from the memory. This requires just a single query to the database, but needs some memory usage. But, the data retrieval is efficient. And, all the updates happen on one document in the collection.

The second option is what you already have - two collections, but with the subset of the task data (the cost and time estimates) stored in the project, and is changeable. The big question is how often? Is it a complex operation? In case of a change, you have to update both collections. This is possible.

The third option is that have two collections, similar to the one you have, but, the project stores only the reference (usually the id field, and is not changeable). The querying of the project and tasks requires a “join” operation (the aggregation $lookup) to retrieve the project and task data for the initial board view. With proper indexing this is not an inefficient operation, and not a complex one either.

In general, you can consider factors like the amount of data and the important operations - in your design/decision process. Creating some data samples and performing some queries on the data can reveal the possibilities.

1 Like

@Prasad_Saya thank you very much for your detailed explanation :slight_smile:

I initially considered the embed-all solution, but then discarded it because I felt a Project document may become too heavy with all the tasks embedded.

The heavier parts of a project would be tasks and relative comments. A project could have on average say 30 tasks, each of which could have say 10 comments.

Would this amount of data still be considered “acceptable” for the size of a document in MongoDB?

I know MongoDB can handle documents of up to 16MB, but often what’s possible is one thing, what’s ideal is another!

Looking forward to your reply, and thanks again.

@Valerio_Velardo, the amount of data doesn’t sound much in terms of bytes. As such you have a quite well defined boundaries for the tasks and comments. The rest is about your application functionality - the queries. So, these are my thoughts so far!

1 Like

@Prasad_Saya thank you for the prompt reply :slight_smile:

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.