Time series: How to get the most recent document?

Martin_Prodanov · August 16, 2022, 7:42am

Hi,

thanks for the explanation. My personal speculation - It seems that the optimizer does not really now how many actual “documents” are stored in a bucket in order to push the limit step up in the query plan. In a normal collection, the index size corresponds to the the number of “documents” and it is enough to scan the index only to get the number of required document as specified by the limit op in the query. In a time-series collection, the number of documents in a bucket varies and this metadata is probably not present to the optimizer. Therefore it cannot push the step up in the plan but requires to first unpack all the buckets that match the query criteria and then perform the limit step.
This is a significant difference in the behavior and IMHO it has to be listed in the limitations of time series collections. In a normal collection the query predicate can match a billion documents but if you only want 10, it will return pretty fast. Whereas doing the same query on a time series collection may lead to an unresponsive state of the database for some time.

Regards,
Martin