Schema Design for a Social Media App

Anudeep_Ananth · April 9, 2024, 4:39am

I am creating a Social Media Application, Lets assume the following:

The App has 5 million users
Each user follows 1 million other users
Each user has blocked 100 other users
Each user has published 150 posts
On the client side, each user has a home feed that show posts published by the people a user follows

What would an ideal schema design be for this?

Here is my take:

Users collection → all user documents
followed collection → When a user follows another user, then the follower user document is duplicated and an additional field is added that contains the id of the followed user. ( 1 document for every follow relationship since there is no limit to no of people one can follow )
blocked collection → same as followed collection except a user document of the blocked person is duplicated along with blockerUserId
Posts Collection → Contains a post document for every post along with the post publisher’s userId field
UserFeed Collection → This is the tricky part, what I am currently doing is the following:

When the client opens the app, query top 20 followed people, get their posts and duplicate them into a userFeedCollection, each post document here will contain an additional field feedUserId ( to ascerain whose feed this document belong to ). Query the next 20 and so on until at least 200 or so posts all added to userFeedCollection for a specific user

Problem with this approach : When a post is liked/commented then all the duplicated versions in the userFeedCollection of the post need to be updated, this is expensive.

What would be the ideal approach to this?

Aasawari · April 9, 2024, 11:02am

Hi @Anudeep_Ananth and welcome to the community forum!!

The best approach for the schema typically depends on the specific use case and how will the application grow in the future.

For a large scaler application where the number, using the

Assuming all the above numbers to be strict, here is how I would design the schema for the application.

User Collection: That contains all the information related to the user.
For example:

User 1: 
{
user_id: 1234,
userName: "ABC", 
No_of_Followers: 123,
No_of_Following: 45677, 
Blocked List: [{ list of user ids which have been blocked by each of the user } ],
published posts: [ { List of the post Ids reference from the posts collections }]
........
}

Follower collection: You can make use of the database references to create the references to the Users collections.

Post Collection: Should have all the information related to the posts:

{
postId: 5637,
post: 
createdAt:
UpdatesAt: 
LikedBy: [{ list of userIds}],
Comments: [{
Details about the user who commented and what is commented. 
}]
}

UserFeed Collection: This collection can be created a an output for the aggregation pipeline that evaluated from the posts and the user collection. In my opinion, this would help with the duplicity of the user ids and always have the updated data based on the timestamp.
You can read about $out here.

**Please note that all the above suggestions are subjected to change based on the applications infrastructure and the scalability of the application.
It would also depend on the specific use cases and the queries that you would like to perform on the collection. **

Please feel free to reach out in case of any further questions.

Best Regards
Aasawari

Andrew_Davidson · April 9, 2024, 12:53pm

The Socialite reference architecture may also help you
https://www.askasya.com/post/socialstatusfeed/