10k document / collection antipattern

Ryan_Langton · January 13, 2022, 2:22pm

TL;DR what is the best solution for handling very large data sets?

We’re using mongodb for a processing results data set (results are written and queried, never updated)… a single execution goes to a unique collection, below are the collection stats… this processing execution can occur 60-100 times daily and there is some worry about us hitting the 10k+ collection performance problem and suggested we look into sharding the data… but sharding and replication only improves performance and does not look like it would address hitting the 10k… should I instead be looking at storing more data to a single collection (it’s already 400k+ documents so seems like a lot)? or a cold storage solution (move old results out of mongo to external file)? or something else?

Ryan_Langton · January 13, 2022, 2:23pm

Here is a single collection statistics. This is a typical collection of results for us, but can vary widely from 1/100 this size to 100x.