We have a mongodb database with 50 collections of sports statistics, covering data for multiple seasons. We have a sports website connected to our database. Let’s assume we have data for 2018 - 2021. Each day, team-average statistics update for the latest season (2021), and what we need to do for each of the 50 collections is:
- drop the previous rows in the collection where
- insert new, updated statistics for
Currently we have two approaches, and both seem flawed. We are trying to figure out what approach is best. We are considering:
- drop the old rows from the collection, and insert new rows with updated data
the problem with (1) is that inserting new data can take 30-60 minutes daily, which means there will be 30-60 minutes where all data for season = 2021 is missing. this is a problem for our website because visitors during that 30-60 minute window will see no stats, not even the 1-day-outdated stats.
- duplicate the collection, drop old rows from duplicated collection, insert new rows into duplicated collection, then delete the original collection and finally rename the duplicated collection to the original collection’s name.
the improvement with (2) is that the 30-60 minute window with no stats is entirely gone, since all of drop old rows + insert new rows is happening in a 2nd, duplicated collection, and only when everything is done, does the original collection get dropped and the new collection get renamed. the problem with (2) is that duplicating all 50 collections takes SO LONG.
- create a brand new collection, insert only the new data for 2021 into the new collection. Then drop the old rows from the original collection. then merge (literally just combine all of the documents together) the original collection with the new collection, thus completing the update
for (3), I’m not sure if this is feasible at all, and this leads to my question. Is (3) feasible? Is it possible to merge 2 collections into 1. By merge, I mean straight-up stack the 2 collections right on top of each other. If not, which of the options between 1 and 2 is better? What are the general best practices for these types of daily upserts into many collections in a mongo database? Any help is appreciated!!!