Newbie would like advise on MongoDB schema optimization

Berkeli_H · January 23, 2022, 10:39am

Hello,

I’m currently working on my first full stack application and I’m building the backend with node.js + typegoose.

I have a good understanding of concepts and limits but I’m really not confident in my abilities and end up questioning every step I do. The MongoDB schema has been one of those things that kept changing and delaying the release of the project. Mostly because it’s quite easy to remodel and it’s dynamic

So the project:

I will communicate with an open API (wargaming API for world of tanks) and store player statistics for data aggregation and visualisation.
The API doesn’t provide player session data (daily games played) which will be calculated and stored as well.
There are millions of players in each of 4 clusters (EU, NA, RU, ASIA)

Initially, I thought I will create 1 big document about a player and store everything there:

Player details (~30 fields)
Player’s tank details (array of max 700 objects that have 15 fields and one nested object)
Player’s session stats (array of sessions on each update, e.g. it will have 1 object per day if updated daily)
historical data (array of object with 5-6 fields of player to keep historical info)

Now I already understand that player session needs its own collection, but I can’t decide between the 2 models I have in mind:

Document _id will be same as player _id and it will have 1 field sessions that will contain an array of sessions.
Document _id will be unique and new record for each update, so bunch of documents as sessions instead of an embedded array.

The question about player’s tank statistics - should I keep it on the player document or also move it to it’s own collection with player_id+tank_id? The reason I ask is that a lot of aggregation will be based on this, and I think nested array within player collection might not be the best thing to do?

Sorry for the long post and thanks for any help!

I have attached a sample “unoptimized” document