Multiple documents updates

Jo_Huang · October 12, 2020, 6:36pm

Hi,
I’m trying to design a simple social network data model.
I found this repo https://github.com/DimiMikadze/create-social-network/blob/master/api/resolvers/follow.js as a reference.

My question is in this node.js app, a “Follow” action contains 3 document updates (1 relationship document, and 2 user documents) .
If the app crashes or the network issue happened, the state might be in the middle of the whole state (ex: 1 updated, 2 not updated).

I think it’s a very common pattern, but I don’t know how to handle it. Is there any design pattern to avoid it?
Although transaction is introduced from 4.0, I would like to know the best practice of this common situation.

2-phase commit?
transaction?
or some other recovery mechanism?

Thanks,
Jo

Pavel_Duchovny · October 13, 2020, 5:08am

Hi @Jo_Huang,

You have found one of the main concern for designing relationship in MongoDB where if you have many collections to update/query you will endup in not optimized and reliably harder schema to maintain.

One of the rules is try to use as less collections as possible and data that will be queried updated should be stored together.

Therefore, I am not sure why do you need a relationship document. I am not seeing how a relationship will be queried without a user context (either follower or following) . For this reason I think there are 2 documents to update :

User document of the one who followed another user.
User document of the one who is being followed/subscribed.

Now doing those 2 documents can be done with ACID transaction if you have to ensure data consistency across documents. However, it could also be done in an async way where one of your services (or atlas trigger) is listening to follow requests and update the followed user together with other application logic like sending a push notifications to that user.

Now the problem of keeping all followers in one array as you will have unbounded arrays which is a known mongo antipattern. Therefore you should look into the outlier pattern design for heavy users.

Of course I recommend using all the baked in mechanisms like retrayble writes and causal consistency to improve failure writes. Perhaps also add a retry logic of your own and use $addToSet to push relationship to have no impact if operations are done multiple times to the data logic.

https://www.mongodb.com/article/schema-design-anti-pattern-summary

Thanks
Pavel

Jo_Huang · October 13, 2020, 11:02am

Hi, Pavel
I found this reference

On page 18, it seems suggesting the relationship (edge collection in the page) collection.

Could you help me understand more?
Is there any recommended social network data model design reference?

Thanks

Pavel_Duchovny · October 13, 2020, 11:58am

Hi @Jo_Huang,

I see this presentation nis based on socialite which is our community project mimic of social network to test MongoDB workloads.

I think the relationship collection is in a way an outlier collection.

One of the consideration when having lots of arrays and possibly index them for searches might introduce an overhead maintaining them or will need large ram to keep the hot working set in memory (best practices).

I would like to emphasize that this project and design was initially based on very old MongoDB versions where the storage engines and compression as well as index optimization was different… I am not saying most of the consideration don’t apply but I would relay on more up to date content like our pattern blogs and performance new blog series … I will link them here for you to read!

( Linked last article as it has all others )

Let me know if that makes it clear

Best
Pabel

Jo_Huang · October 13, 2020, 5:35pm

Thanks for the references. I’ll check it!