Feedback on structuring data

My small project has been picking up traction and I’m taking this as an opportunity to try and create a new, more robust, database. I’m looking for feedback on different schemas/structures I’m considering for my database because I’m relatively new. I’ve spent time looking at links from Mongo University, other forum posts, and stack overflow to understand the fundamentals more however a lot of my questions and concerns are particular to the data I’m working with. Because of this, I’m also going to be explaining my data through the analogy of baseball.

Currently my data is represented such that every baseball team is a collection in the baseball database. The first document of each collection stores meta information such as the team name, colors, ballparks, manager, etc… Every proceeding document represents a player, where in each one, are that player’s statistics such as batting average, home runs, position, accolades. What is also in each player’s document is an embedded object recording their performance for each game they’ve played in the past - this is important (at least I’m thinking it is) because the file size can get quite large as some players play hundreds of games and there are many interesting statistics in each game recorded. Using The Yankees as an example…

// baseball database
{
   // NY Yankees collection
   {
      // meta information document
      {
         "teamName": "Yankees",
         "colors": ["midnight navy blue", "gray", "white"],
         ...
      },
      // Player data document
      {
         "name": "Yogi Berra",
         "battingAverage": 0.285,
         "homeRuns": 358,
         "matches": [
            // Yogi played from 1946-1965, so this would be quite long.
            { },
            { },
            ...
            { }
         ]
      },
      // Player data document
      ... // For every player that has played for The Yankees
   }
   // Boston Red Sox collection
   { ... }
}

For my new “robust” database I’d like to nest everything 1 further dimension so it looks like:

// baseball database
{
   // baseballTeams collection
   {
      // NY Yankees document
      { ... },
      // Boston Red Sox document
      { ... }
   }
}

The reason for this is because it makes querying easier since I’m operating in one collection db.baseballTeams.find( {} ). A glaring issue that’s stopping me is that each document seems it can become prohibitively large since one, players per team is unbounded and two, matches per player is unbounded.

A compromise to this is I can have two collections baseballTeams and matches. matches would simply be an unorganized bag of every single match represented as documents and each player for every team in baseballTeams would then just point to their corresponding match in matches. On top of it seeming like a compromise, I’d also be able to easily query and calculate interesting stats like the database batting average of every single player. My concern with this however is that the “unorganized bag of every single match represented as documents” does not seem very appealing and I’m not sure if this is a healthy way of storing data.


I’m looking for any feedback & help on:

  • Any ignorance I have towards storing data in the above ways.
  • If there is a more ideal way to store this type of data (where ideal means there’s no concern for prohibitively large file sizes and CRUD operations are efficient).

Thanks for sticking through the read.