Hi I’ve read the article on the bucket pattern however it does not go into details about consistency because it uses a simple example where data is coming every 1 minute.
The example suggests placing the data into an array bucket and addition a couple more fields:
{
measurements: [...],
transaction_count: 42,
sum_temperature: 2413
}
I have a use case similar to this but what if my data is coming at random and at scale? For example, let’s say I have a sum of 1000, then 2 new datas are coming at the same time, one with 50 and another with 30 - but if they are processed in parallel, then the sum would increate by the value of whichever one is first, and not by 80, because I have to read the current sum first then algorithmically.
Sure there’s $inc which I suppose would allow me to achieve consistency however my use case is actually more complex and results in calculation of max values over an aggregation. Because it seems to be an expensive operation to compute max for each query, for all data points, I’d like to create buckets with max values, and change those at the application level when new data arrives. However I can’t find any info about this most basic distsys problem. Therefore I need to come up with a solution to this problem - like somehow lock the collection while incrementing max value but that sounds insane when there could be hundreds of requests per second.
How to achieve what I want, and what literature I should consult? It seems that this kind of task is actually very common but it’s really frustrating there’s no professional expert material of how to achieve such consistency.