Getting last entry before a certain timestamp in time series

calligrafo_K · February 1, 2022, 5:43pm

Hi

Consider a time series with two sources defined as:

timeseries={ "timeField": "ts",
                     "metaField": "source"}

and assume each documents, in addition to “ts” and “source”, has a “temperature” field. I am interested in historical values of “temperature”. To be exact, for a given timestamp t0, I want to get the last “temperature” less than or equal to t0 for each source. Note that “ts” for sources can be different, not necessarily synced.

To me, that seems like a very basic requirement from a time series database, and is probably already optimized in mongodb’s time series framework, but I am having difficulty finding an efficient solution for it. This is what I have implemented myself, using aggregation framework:

 db.col.aggregate([
{"$match": {"ts": {'$lte': t0}}},
{"$group":{"_id": "$source", "hist_temperature":{"$last":"$temperature"}}}
])

Essentially, matching records that occurred before t0 and picking the last item for each source from the grouped items. But this obviously is not efficient as the time series, behind the scene, is partitioned by source and sorted by time for each source. The matching by time stage in the beginning violates that indexing order. Does anyone have an efficient alternative solution for this? Appreciate your help/comments.

Nuri_Halperin · February 1, 2022, 5:55pm

Would $topN or $bottomN help achieve that? (V5.2)

calligrafo_K · February 1, 2022, 6:02pm

I assume those return top and bottom n records according to some criterion, sort of similar to $limit? I have to clarify that here t0 is an input, i.e. I am asking tha time series “give me the most recent temperature for all sources before t0”. Thanks

Nuri_Halperin · February 1, 2022, 8:07pm

Are you more concerned here about the efficiency of letting agg do the group by with the compound (source,ts)? Or is this about what syntax can produce a dynamic “sort but don’t give me the last one”?

Maybe an example like this for the data set and output can help clarify the intent / challenge?

The docs describe secondary index scenarios so maybe not a worry?

calligrafo_K · February 2, 2022, 2:26am

Having multiple secondary indices and using hint(), as mentioned in the link you shared, is probably the right solution. Thanks for the help, appreciate it.