Keeping Extended Reference Pattern up to date

Hello everyone, a newbie in mongodb and document schema modeling. I have a question about techniques to keep extended references up to date. Let’s say I have two collections: an aliment collection and a meal collection. Each aliment is created and update by the user. Each aliment has a name and a list of nutritional values. Each meal is created and updated by the user. Each meal has a name, a list of extended reference to Aliments and a computed nutritional values based on the nutritional values of each aliment of the meal.

Assume now that the user changes the name from PASTA to PASTA Barilla. What are the best techinques to propagate that change to all meals that reference the old PASTA aliment?

For the moment I can think of the following:

When the user updates the aliment, I check each Meal and update the extended reference.In this case, I will have one transaction updating the Aliment document and a second bulk operation to update all the meals.

But what happen if for some reason one meal update fails because the user updated the meal in the meantime?

Thank you

Hi @Green,

Thats an interesting design considerations.

For some design the extended reference could be very minimal due to update consideration.

The idea is to keep user specific data only on the users documents and updates that might be cross user/meals changes to another reference collection. Keeping data in both places makes sense only if it benefits the queries and not changed often.

Now I have a few questions to better help you.

  1. How often would names of products might be changed?
  2. What other fields are expected to change over meals? Do you expect ingridants to change nutrition values?

In general I think that to guarantee consistency across collection you will need to:

  1. Either use transactions to update the meals and aliment and finally commit or rollback.
  2. Have in the extended reference only data that needs to be shown using the query to meals and does not need to be updated over time. If all fields could be updated maybe it makes sense to store the id’s of those ingridants and make 2 queries to show them.

Thanks,
Pavel

Hi @Pavel_Duchovny, thank you again for your answer! To be honest, I suppose that Aliment’s name and nutritional values are not going to change often. Probably, it would be better to keep the created aliment read only and let the user create a new aliment out of it. Thinking about it, since the Aliment is not only the name, but also its nutritional values, if you update it, you are probably creating a new aliment. Also, wouldn’t it be weird that if you have 10 meals with the same aliment and when you change it all the meals gets updated without your knowledge?

For your first general consisentcy concern, does that mean that I need to implement compensating transactions in case subsequent ones fail?

Thank you

Hi @Green,

The transactional behaviour make sense if you want users to see or update only commited data and if one changes something outside of a transaction it won’t succed until transaction is committed or rollback.

If your flows only update one independent document I would use retrayable writes as a retry mechanism rather than transactions.

Regarding the data model, it sounds like you can potentially create draft meals for approval to all the users ones there is a change.

Potentially, you can run the update by creating new versions of the user meals:

db.meals.findAndUpdate({userid : ..., Aliments:  previousAliment},{$set: {status : "inDraft}})
...
db.meals.update({mealid : ...},{$set : { Aliments : [ {...}] , status : "approved"}

So application only count on approved meals and mark those in change as inDraft. This way even if you fail the version will still be in draft and not yet approved.

Let me know if that makes sense.

Thanks
Pavel

Hi Pavel, I did not think about drafts, thank you!
For transactions, some of the flows will touch in fact multiple documents, mostly to notify that the data of a related document changed, but in general these changes are not going to trigger business logic on other document rather than simple updates of the view data.
Since Im new to mongo and I dont really like retriable stuff, I would like to try follow the DDD style and use a single transaction per document. If an action that touch a document trigger something on another document, I will trigger a transaction on that document and compensate on the first transaction if necessary (most of the failures will be due to versioning and Im expecting much more reads than writes) (looks like a saga). Everything is for now synchronous, if it works well I will move to async communication between documents.

Thank you

Green

@Pavel_Duchovny I had to change my model. I realized, that the users may have personal aliments, and putting all inside the same collection may lead to performance problem when querying that data. I was thinking to create a AlimentCollection document and have inside a nested array of aliments. Correct me if Im wrong, this way I can:

  1. Regulate the number of aliments created
  2. Reduce query overhead since I will first ask for the collection using the userId index and then select on the nested array and project for example only the name of the aliment.

Now, I have some doubts:

  1. Is it possible to implement paging and text search on the nested Aliment array?
  2. I saw that is possible to update a single element in the array without having to extract the whole document. Does this apply for insertion and deletion of a new Aliment?
  3. To implement versioning (but Im not sure if I need it since only one user is updating the Aliment Collection), can I add a condition to the update statement to check the given version and the actual version in the db?

Thank you very much!

Green

Hi @Green,

If each aliment collection will endup in a document per user you can store those in an array.

Now controling the size of the array is possible and recommended as we don’t want large unbound arrays. Those are bad antipatterns.

One option is to keep an array size field to maintain the amount of elements. Then we can inc it by +/-1 if we pull or push. A nice trick is to use $each with sort and slice :
https://docs.mongodb.com/manual/reference/operator/update/slice/#use-slice-with-other-push-modifiers

If array grows beyond what we need we can extend with a new document or inform the user on possible takeons.

Text searches are possible on nested fields as you can define the text index on nested fields and even with wildcard expression. Having said that the best text search is when used with Atlas Search which is one of the many benefits using Atlas managed services.

Updating can be made to an array directly including pulling and pushing or adding to set operations.

We have array filters for more complex array updating:.

https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/

Not certain about the version question, do you mean you want to used findAndUpdate to only change documents of a specific version? Will you use a version field?

Thanks
Pavel

Hi @Pavel_Duchovny thank you again. I realized I posted on the wrong question… Btw, after analyzing the one-to-many relation using embedded array, I realized that Im very limited in sorting. As you suggested, I can extend the document of aliments with a new document for that user (is the the outlier?). But what happen if the user wants to sort by lets say, amount of kcal?

The user will access his Aliment “portfolio” in the aliment section and it is presented with the list of aliment in alphabetical order (paged). The records shown are just a projection of the name and the id of the aliment and when the user clicks on it, the details are shown (nutritional values) (maybe I dont even need to project the name). Now, the user may want to sort according to lets say the amount of proteins or filter based on tags the user can add to the aliment.
At this point, isnt better to go back to a Aliment per document? (I would just move the count of the aliments n a separate document and use that to regulate the number) If that is the case, I was thinking to use the aggregation framework and do:

1. Match on userId
2. Project the needed values (if filter by tags, then tags id and name of aliment) (if filter by name, it will be a text search)
3. Match the values with the filter
4. Use the skip and limit for pagination

This aggregation should first filter using the userId and then text search or tags.contains will be used right? (Im trying to avoid scanning documents of other users)
I know that skip is not efficient… and I know it can be done with last key, but this should work only on indexed fields that are ordered?

About the version, im referring to concurrency version, but, again, since Aliments are private to a single user, I should not need this.

To be honest, embedding and extending the document makes sense, given the privacy of the aliment. But Im concerned with read performance. Also, Aliment will be used by other entities in the application, but only copied.

At first I was also concerned with write performance, because with embedded I need to extract all everytime I made a change, but extending the array of aliment to multiple document looks very nice.

Thank you!

Green

Hi @Green,

Again, If we are talking about 1-to-many of 20-50 elements they can be still embedded , but if its 1-to-1000 than seperating each one/group to its own documents/collection makes more sense. Also you need to consider the concurrency that docs will have. Lock on a document level is the lowest so if your arrays are concurrently updated frequently thats not good.

If the amount of aliments are in 10s per user you can use aggraregation to page and sort them by using an unwind stage and then sort stage + skip/limit or any other transformation.

Although this is done in memory but it should be ok for low number of array elements.

Now if it starts exceeding use outlier pattern to have the documents as seperate.

The beauty of MongoDB is you van store meta data in documents to describe them and maybe have different types for very large lists :wink:

Thanks
Pavel

Hi @Pavel_Duchovny, to be honest, we are talking about max 100 200 elements for most users, but for some users it can reach 1000 elements (which is the hard limit). Probably, using the outlier patter will cost me more code, so Im not thinking to go back to one aliment per document. The question is, would it be a difference in terms of read peformance in the cases:

  1. Embed with outlier with a default sorting, which should be alphabetical. this way, for the standard and probably most used sorting we already have the data ready. For other kind of sorting, we aggregate all aliment arrays from each outlier document and then we do sorting filtering on the concatenated aliment arrays.
  2. One document per aliment so that I dont have to deal with “chunks” of data. Sorting and paging should be easier.

For the second option, I was wondering if the aggregation I described above would be performant. As far as I understood it, the first match on the userId would be efficient since I have an index on it. At this point I project the needed data, so that I dont touch other details that are not needed now. Next, I do text search on the name (@TextIndexed from Spring data mongo) only on the documents that match the user id from the first match stage. Finally, skip and paging will occur only on the subset of data that matched the first two match stages.

(Sorry for all these questions, but since Im coming from sql, Im afraid to be biased towards it, and not fully embrace the nosql world)

Thank you!

Hi @Green,

Ok in that case there might be a dilemma …

Having a compound index on userid and name multikey text index sounds suboptimal also unwinding large arrays is not my favourite.

The numbers of 200-300 are upper limits of embedding leaning for me towards not embed.

I would say it sounds like separation sounds like the better way to go.

Regarding skip that sounds right.

I hope that covers your questions. If you want to update us with your test you are welcome.

Thanks,
Pavel