I am creating a Social Media Application, Lets assume the following:
- The App has 5 million users
- Each user follows 1 million other users
- Each user has blocked 100 other users
- Each user has published 150 posts
- On the client side, each user has a home feed that show posts published by the people a user follows
What would an ideal schema design be for this?
Here is my take:
- Users collection → all user documents
- followed collection → When a user follows another user, then the follower user document is duplicated and an additional field is added that contains the id of the followed user. ( 1 document for every follow relationship since there is no limit to no of people one can follow )
- blocked collection → same as followed collection except a user document of the blocked person is duplicated along with blockerUserId
- Posts Collection → Contains a post document for every post along with the post publisher’s userId field
- UserFeed Collection → This is the tricky part, what I am currently doing is the following:
- When the client opens the app, query top 20 followed people, get their posts and duplicate them into a userFeedCollection, each post document here will contain an additional field feedUserId ( to ascerain whose feed this document belong to ). Query the next 20 and so on until at least 200 or so posts all added to userFeedCollection for a specific user
Problem with this approach : When a post is liked/commented then all the duplicated versions in the userFeedCollection of the post need to be updated, this is expensive.
What would be the ideal approach to this?