Hi guys,
We are fairly experienced with data modeling in MongoDB. However, we will have new access patterns in future that are more BI-like/OLAP-like for some of our collections.
Scenario:
- we have collection where we store Risks (we are a risk management software). In this collection we store the most current approved version of a risk (hotter).
- In another collection we store all the versions of all risks including the most current one (colder).
- Currently we have typical OLTP queries like give me all risks given some conditions and sorting or show me some of the earlier versions of one particular risk - all good.
- Additional access patterns will be more of a time series/OLAP-style like “how have the risks developed over time?”.
- Example: How many risks have we had for each day for the last 3 months? Imagine a line chart with date as x axis and total number of risks as y axis.
- Risks and their versions are only created by human beings and don’t follow any update patterns: some risks are modified daily or more often, others are untouched for years.
Naive approach:
- We could copy all current versions of the risks into a new collection every day and by this create daily time slices of the full population. We don’t differentiate between touched and untouched risks.
- We can aggregate quite well.
- Would kind of work, but as a SaaS we are getting more and more customers…
- I don’t like these kind of batch and potentially spike behavior.
Questions:
- do you know a better, smarter, more efficient modeling idea around this challenge?
- Maybe some smart additional properties on the versions collection somehow adding validity time spans?
Cheers,
Manuel