Timeseries data and late arrivals

Hi there,

while working with the timeseries collections I faced several times the fact that late arriving data can be part of the use case. E.g. log statistic data from devices which are not constantly online but do provide their data in a strict timeline a “time-series” when they connect.
Since it would break the principles of timeseries data to insert late arrivals in the time series, these late party guests are a problem. I am interested to learn how you solved this.
One option I used is to buffer all incoming data and write to a timeseries collection in batches. However, when it comes to close to realtime analytics this is not the best solution. When batches are written in a low frequency you have a certain delay, if you go by higher frequencies you may need to drop some late guests or build logic on top of your buffer…

I’d like to get in a conversation here to find out if we may have other solutions and hopefully we can compile something like a best practice here.

Regrads,
Michael

Hi Michael,
Would you be able to share what kind of issues you faced when you attempted to insert late arriving data? Errors? Performance issues? In theory time series collections should be able to handle those just fine but of course we’re constantly looking for ways to improve how they work.

Thank you,

Bora

1 Like

Hello @Bora_Beran

it took me a while to make sure to split issues here. Long story short: inserting late arrivals works just fine.
I have not seem major differences between many individual inserts and batch inserts. (Batch is faster and recommended in case there are so many late arrivals… )

Issues which came up were data related. E.g. some of my 30 Mio docs converted a date to an invalid UTC date which lead to errors along the process chain, also not respecting that a date convert with no explicit timezone takes the local timezone lead to a mix of the time_series_. As well as an process issue since unvalidated client time was stored in the raw documents. I mention this since I came across this and others may get here a trigger to reflect own code and processes, the timeseries collections work fine.

I have moved a second issue to :postbox: a new post to kept issues separate

Regrads,
Michael

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.