Hi everyone!
I’m new to MongoDB and would like to get some help modeling my forum data. The main things that I currently try to model are the questions, answers, and votes (I also have users collection, but I dealt with it already, it was pretty easy). My website is very similar to Reddit or StackOverflow. each question has a title, description, creator, if the creator is anonymous, time created, and tags. Each answer has content, creator, if the creator is anonymous, and time created. Every answer is also linked to a question and possibly to another answer (if it’s a reply to another answer). I also want both the questions and answers to have voting (upvotes and downvotes). If I would just embed everything it would look something like this:
questions collection:
- creator: userId
- createdAt: Date
- isAnon: boolean
- title: string
- description: string
- tags: string[]
- answers: AnswerSchema[]
- upvotes: userId[]
- downvotes: userId[]
AnswerSchema:
- creator: userId
- createdAt: Date
- isAnon: boolean
- content: string
- replies: AnswerSchema[]
- upvotes: userId[]
- downvotes: userId[]
That doesn’t seem like a good idea because even tho embedding is considered the better approach most of the times, it sets a limit for how much data I can store (even if in the beginning I won’t have many answers/votes, but what if my website will grow and have a lot of data?).
I thought of just don’t everything with referencing so I’ll have four collections: one for questions, one for answers (with a question id field to reference to the question, and an answer id field for when it’s a reply.), and two for votes (connecting between answer/question and user, and another field for if it’s downvote or upvote). And then also adding to the questions and answers collection upvotesCount and downvotesCount.
This still doesn’t seem perfect because each time I’ll want to update the votes, I’ll need to update two different collections. Also each time I want to get questions/answers and also to get if a user already voted on them, I’ll need to have two different queries and then somehow combine them.
What would you recommend me to do? Use embedding or referencing and where?