Feedback to schema-structure

Sun_23 · August 21, 2023, 6:26pm

Hello guys and girls,

i am a bit of a beginner in mongodb and very insecure, how to design the schemas for my webapp (MERN-Stacks and hosted on AWS-EC2), so i would appreciate a bit feedback, if i am doing some very stupid beginner mistakes.

I want to build a social media plattform, where everyone can post texts and other can rate it and write comments.

The user should also be able to look up all their own ratings and comments in their profile(so only to put the ratings and comments in the documents of the related text seems to me wrong, otherwise i would have to „scan“ every text document, to find, where the user posted a comment or rated).

But also to duplicate all the content to put the comments and ratings into the userdocument AND the textdocument seems also be wrong (because you should avoid duplicate data, because of storage and maintain reasons).

So my design would be to have 4 different collections (Users, Texts, Ratings, Comments).

If now someones open a text, i would grab every comment and rating, that have a ref from the requested text.

If then some user looks up his own comments or ratings i would grab every comment or rating, that have a ref to from the user.

Is this fine? Or would it need to much processor power? For me it seems a bit „strange“ to look everytime, every comment or rating! But i heard, that the mongodb is very fast to look up complete collections (even big ones) and only cost very low processor power. I am bit of a nooby.

I am also thinking about using a hybrid-approach and to add the first 10 comments also in the related text document and add the average rating (and only update the last one every 3 hours). Is this necessary for 1000 or even 10k users?

The next but small question is: I want to show the name author of the text, but with the ref-command of mongodb i am only having the user_id of author (So, then I would have to search all users and assign the particular user_id to the authorname or username, everytime i display a text). Would it make sense also to add the authorname to the textdocument and also only update it manually in the backend every 12 hours? Or i am overestimating the necessary processor power again?

Thanks for your feedback and advices.

slava · August 23, 2023, 5:55pm

Hello, @Sun_23 ! Welcome to the community!

Data model can greatly be affected by the frequency of writes and the way you get necessary data for your application and many other factors, so it is not easy to provide some strict rules for data modelling for your specific case.

Although, I can give you some advices:

Embedding is good, but make sure it does not complicate the queries much. Also, make sure your documents does not have arrays that can expand indefinitely, as you may hit 16MB BSON-document limit.
If you often need to get a set of documents from one collection and join data from other collections, consider embedding that data within document, like so:

    // Comment with embedded user data
    {
      text: String,
      user: {
        _id: ObjectId,
        name: String,
      }
    };

Make sure you maintain a decent level of “freshness” of the embedded data with some background processes. Think of how important for users to see some up-to-date info like latest user avatar or it’s rating. If it is not really important - do not update it very often.
Hybrid approach, that you’ve described sounds good, you can add it to speed up your queries later, when your datasets grow. At the beginning you probably won’t need it.
Remember, that MongoDB uses flexible schema, so you can easily adapt it later, if needed