I’m searching for an algorithm in MongoDB concerning the following problem. Assume the following data (simplified):
{ ts: 1, x: 1 } // ts: timestamp, x: integer value
{ ts: 2, x: 2 }
{ ts: 3, x: 2 }
{ ts: 4, x: 2 }
{ ts: 5, x: 1 }
{ ts: 6, x: 1 }
{ ts: 7, x: 4 }
{ ts: 8, x: 4 }
{ ts: 9, x: 4 }
{ ts: 10, x: 5 }
{ ts: 12, x: 5 }
{ ts: 13, x: 1 }
{ ts: 14, x: 1 }
{ ts: 15, x: 1 }
{ ts: 16, x: 1 }
{ ts: 17, x: 2 }
I’m looking for an algorithm that compresses the data points based on x
in the timeseries, essentially converting the timeseries into a series of change points rather than returning all individual data points.
The output would be the following (assuming that I prefiltered the data for 2 < ts < 16
):
{ ts: 1, x: 1 } // filtered out
{ ts: 2, x: 2 } // filtered out
{ ts: 3, x: 2 } *
{ ts: 4, x: 2 } // removed, because no change in `x`
{ ts: 5, x: 1 } *
{ ts: 6, x: 1 } // removed, because no change in `x`
{ ts: 7, x: 4 } *
{ ts: 8, x: 4 } // removed, because no change in `x`
{ ts: 9, x: 4 } // removed, because no change in `x`
{ ts: 10, x: 5 } *
{ ts: 12, x: 5 } // removed, because no change in `x`
{ ts: 13, x: 1 } *
{ ts: 14, x: 1 } // removed, because no change in `x`
{ ts: 15, x: 1 } * (kept to mark the end of the timeseries)
{ ts: 16, x: 1 } // filtered out
{ ts: 17, x: 2 } // filtered out
Final output (each data marks the first point of value change):
{ ts: 3, x: 2 }
{ ts: 5, x: 1 }
{ ts: 7, x: 4 }
{ ts: 10, x: 5 }
{ ts: 13, x: 1 }
{ ts: 15, x: 1 }
My implementation ideas and attempts include the following:
-
Using an
aggregation
for each data point, look up the previous data point in the collection and merge theprevious
ts
with the current one. | 1M data points, 5 minutes of runtime, frequent OOMs -
Using an
aggregation
group all data together, run a pairwise reduce, detect the changes, and then filter out elements without a change. | 1M data points, 5 minutes of runtime, frequent OOMs -
(in progress) Using an
aggregation
group all data together and run a custom$function
that scans the data and detects the change points. | (in progress)
Should MongoDB be used for these algorithms (it seems not), or am I missing something?