Bucket design pattern resulting in large working set

fergus · November 17, 2020, 6:51pm

We recently refactored our datamodel to use the Bucket design pattern since we were receiving frequent data throughout the day. The refactoring resulted in a huge reduction of Index size which initially looked good. However, during our testing, we encountered some performance issues.

Our application receives millions of records every 15 minutes, which we aggregate and write to the documents. From analysis, it appears that our application is frequently updating around 120G of data frequently (across a 2 node sharded cluster)

What my questions are:

Should the 120G of frequently updated documents fit into RAM? If so, should it fit into the WireTiger memory (50%-1G of system RAM). Or does it really matter if not.
In our application since the documents updated every 15 minutes is around 120G, I assume that this requires fairly heavy read/writes to persist this data to disk
Is there a good way to estimate the size of the Working Set? Index size is easy to retrieve, but the size of the frequently access documents requires more analysis.

All in all, I’m suspecting that a smaller granularity might make more sense. So documents which contain 1 hour of data vs 1 day of data.