Timesries: unique _id and delet_many by ts

Jona_Wossner · November 9, 2022, 4:32pm

Hi,
I have some questions about the timeseries that where introduced with MongoDb5.0. I only found documentation about the limitations but no suggestion for a solution/workaround.

Is there any way to create uniqueness of _id fields inside the timeseries? On normal collection we could use a unique index I guess but thats not supported for timeseries. Current workaround is to make a find on _id and filter documents befor inserting. Is there any smarter / more efficent alternative for this?
Is there any way to delete timeseries data based on ts or _id that is NOT the TTL of the timeseries?
Delete many complains that it can only be used on “metadata” fields but in my understanding ts and _id should not be in the meta data because it makes the clustering of the underlaying data useless.

Currently we have one collection per “data measuring device (1.500+)”, with the timeseries data from different sub devices inside each collection. Is the idea behind the timeseries having all locations in one timeseries collection and that what is currently the collection name inside the meta field? Because we store other data in that collection that should not be deleted based on time but using 1.500 extra collections to make use of the TTL sounds not efficent. So any idea how to delete only some data inside a timesries collection based on time or should we just use the normal collection/remodell our data?

Hope someone has a better understanding of this and can give me some input.

Jason_Tran · November 15, 2022, 2:22am

Hi @Jona_Wossner,

Is there any way to create uniqueness of _id fields inside the timeseries? On normal collection we could use a unique index I guess but thats not supported for timeseries. Current workaround is to make a find on _id and filter documents befor inserting. Is there any smarter / more efficent alternative for this?

It’s important to note that a time series collection is not exactly the same as a normal collection in MongoDB. MongoDB treats time series collections as writable non-materialized views backed by an internal collection. When you insert data, the internal collection automatically organizes time series data into an optimized storage format.

In saying the above, I would recommend creating a post regarding the unique indexes on time series collections via the MongoDB feedback engine in which others can vote for and the MongoDB product team can monitor.

If there is a hard requirement for uniqueness amongst the inserted data, then perhaps it may be worth investigating a normal collection (with the associated unique indexes created against it) benchmark versus that of a time series collection using your particular workload / data to determine if it is worth trading off the benefits of a time series collection.

In saying the above, could you advise further use case details regarding the uniqueness requirement? For example, Is the particular analysis or result set from the time series collection supposed to be void of duplicates?

The following pages may be good to go over as well:

Regards,
Jason