Storing user likes in many-to-many relationship

fried_empanada · July 6, 2023, 10:20pm

I have the following context in my application:

There are many posts in my app.
There are many users in my app.
Each user can like many posts, and each post can be liked by many users, but since it’s a social media app, there’s a high probability that the number of likes a post can get is larger than the number of posts a user likes.

Currently I have a collection for storing all the posts and a collection for storing all the users, and I have been tempted to store the ObjectIds of the posts that a user has liked as a field of type [ObjectId] in each user’s document. However, based on my understanding I think this is an anti-pattern (Massive Arrays | MongoDB) due to the unbounded nature of the array, but at the same time I see that each ObjectId is only 12 bytes and the 16 MB document limit means that it’s possible to store a list of hundreds of thousands of ObjectIds, and I think it would be a very rare case for a user to like hundreds of thousands of posts.

The alternative is obviously storing all the likes as documents in a separate collection where each “like” document will reference the post and user document by ObjectId. This way is surely more scalable, but in order to perform queries efficiently on the “likes” collection I would have to index both the field that references the post’s ObjectId and also the field that references the user’s ObjectId, and I am not sure if the indexing would take a large amount of space when there are posts that have many likes. It seems to me that there will be a lot of “like” documents when there are many (hundreds or thousands of) posts where each post can possibly have hundreds of thousands or millions of likes.

I have been looking around for existing posts and threads related to this issue and I found this: How to store User's liked items?
, which prefers the “storing likes in a separate collection” approach over the “array of post ObjectIds in each user document” approach if I understand it correctly.

I would also like to hear some recommendations or advice from anyone who has experience with this issue.

Thanks a lot!

Kobe_W · July 7, 2023, 4:55am

16MB is not big enough for a very large scale social app user. A user can easily have many thousands of liked posts easily and a very popular post can be liked by, say millions of people.

In the long term, you should definitely use a separate like collection for it.

That being said, pre-mature optimization is the root of evil. No need to make it over-complicated unless it’s necessary.

If a short term solution is good enough for 5 years. then go for it.

fried_empanada · July 8, 2023, 9:43pm

Hi @Kobe_W, thank you very much for your insight and recommendations! Like you suggested, I think I am going to create a collection just for the likes, but I think I will also keep track of the number of likes each post gets with a field like num_likes in each post document. And I have a follow-up question with regards to this: if I keep track of the num_likes field in each post document, I will need a way to keep the action of inserting a like document and the action of incrementing the like count of a post’s num_likes field atomic so that the data is in sync. What would be a good way to achieve this? I read that there’s the “multi-document transaction” option, but alternatively I can also just count the number of likes while querying the posts, because I mainly want to use the number of likes information in a custom algorithm to calculate a score that would be used for ranking the posts.

Kobe_W · July 9, 2023, 4:11am

Only two results:

the numbers are always consistent, by using transactions
the numbers are sometimes not consistent, without a transaction.

Generally inconsistency in number of likes is ok, (e.g. 1000000 is no different from 1000001). However there’s a way to mitigate it.

You can check this video, the presenter mentioned an async way to fix it. (basically use a background job to count-and-correct from time to time).

fried_empanada · July 9, 2023, 4:54am

I see, thanks for the suggestions! By “transactions”, you mean something like multi-document transactions (https://www.mongodb.com/docs/manual/core/transactions/) right?

Kobe_W · July 10, 2023, 4:35am

Correct. anything beyond a single document operation needs to be wrapped in an explicit transaction for “ACID” purpose.

fried_empanada · July 21, 2023, 9:29pm

Got it, thank you for the explanation!