@Mike_Scornavacca First, thanks for sharing your proposed solution, despite appearing to be a great option to build a social media style feed with a non-relational database there does not seem to be much information regarding this approach. I would be curious to hear what someone at MongoDB thinks, but here are my thoughts.
Answering your Two Specific Questions
- If a user scrolls through their entire timeline, am I supposed to run my inefficient pipeline to populate it with new posts? Should the size of the timeline expand to support users who scroll very far down on their feed?
Can you elaborate on your use case here? Depending on need, the easiest solution is to just have the feed end. I am not sure if they do this anymore, but know Facebook did exactly that for a number of years, at some point there is just a message at the bottom of the feed that said “No more content available.” If you are trying to populate the feed with new posts, do new posts exist? Why weren’t they in the feed in the first place? Long story short, I think this is dependent on your applications need. If each document in the Timeline collection is unique to a user, I would imagine that each post inserted here would only take up a few KB of space, you can easily store hundreds/thousands of posts for a user’s timeline before getting close to the 16MB cap (not suggesting you need to fill each timeline to 16MB).
- If a user decides to follow someone new, and they have recent posts, should I be taking their posts and carefully inserting them into the user’s timeline such that it remains in chronological order?
I think there are a number of solutions here, dependent on your goals. The option I would prefer is to just sort the posts whenever you get the user’s timeline vs. inserting posts in the correct order anytime there is a new follower, multiple posts are made at the same time, etc.
Separate Collections (Joining with Lookup)
I am confident you are spot on here, using $lookup to join the Users and Posts collections will not work effectively as the number of documents in each collection grows. There are countless ‘problems’ shared across the Internet. Although it looks and is easy to implement, it is definitely not the right solution as $lookup would be used frequently for an application with a social media feed. I can imagine scenarios where running an aggregation pipeline can take several seconds (or even minutes) and the user just watches a loader spin. Obviously not ideal for a social media application.
Fan-Out on Write
This approach should work great and after some research appears to be exactly what Twitter does. I really like this approach because it only requires a simple get request, you can set each Timeline _id to match the user’s id (indexed by default) and very effectively fetch the timeline document for the user. Loading the feed would be very quick.
To consider with fan-out on write approach:
- What data is duplicated in the Posts and Timeline collections. Even if posts are recorded to just a few user feeds, any changes to the original Post document (in the Posts collection) would require updates to all the posts in each Timeline document. I would be thoughtful as to what fields are in the Timeline documents to avoid headaches here as well as what fields can change, if any.
- A post could be added to a users timeline at a slightly different time compared to others. For almost or all social media applications this is okay since a post being recorded to someone else’s feed a few seconds before does not drive performance issues and normally goes unnoticed.
- Using a trigger helps. If a user creates a post and it needs to be inserted to a number of timelines, I would push that work away from the client so they can continue to use the application.
- Be mindful of deleting posts. Similar to my first bullet, consider whether or not a user can delete a post. If yes, you will need to remove the post from all timelines.
Out of curiosity, have you started using the fan-out on write approach? How are you handling inserting posts to the timelines of all the user’s followers?