How to structure table to allow quick calculations based on cumulative numbers under constraints?

Don_Boscow · August 6, 2023, 3:12pm

Our DB has hundreds of thousands of rows of data where a particular entity amasses points in a certain entry based on certain constraints.

Let us call the entities A1, A2…A1000. Let the constraints be X1, X2, X3…

Every document, chronicles, serially, the points which say A1 accrued in that scenario under a given set of constraints - say A1 amassed 71 points under conditions of X1, X87 and X90.

Similarly supposed second row says A2 amassed 92 points under X21, X45 and X67.

Now next row has A1 again, this time amassing 29 points, but under criteria X1 and X90 only.

Next row - A1 again, under X1 and X90 again, say 45 points.

Next row has A4, under X1, X90, X67, amassing 345 points.

Next row has A4, under X1, X56, X67, amassing 12 points.

We have millions of rows like this - where the entity can amass any points under any combination of criteria, serially, each document representing one scenario.

Question - the queries run like “Who amassed 90 points the fastest under criteria X1 and X90”. In this case, from our data above, A2 does not even fit the equation, A1 does - all three documents. A4 does in only 1 of the 2 documents, its second document has no X90, only X1, and hence it does not meet the criteria.

However, the cumulative numbers show that it took A1 two documents (meaning two entries) to amass 90 - the first two (totalling 100, exceeding 90). While, A4, in the very first document, amasses 345, thus automatically crossing 90, and since it took 1 document, lesser than the 2 documents A1 took, we can safely say that A4 amassed 90 points the fastest.

Let us assume all documents have an insertion time entry which we can use to sort them during query, since this class of queries require them to be sorted on basis of insertion into the DB.

Now what kind of data modelling or indexing can enable us to calculate these kind of calculations super fast? We cannot precompute aggregates at every document level, since the number of combinations X1, X2…X100 can have are virtually infinite and it is not scaleable to compute totals under each combination and store them in the document, while the query can come with any combination of X1…X100 conditions.