How can I delete the documents between two dates from a Time Series collection?

I have a simple Time Series collection created with the following parameters:

timeseries: {
   timeField: "timestamp",
   granularity: "seconds"
}

The documents contain data about the market price of some asset. Due to a change on how the price is calculated, I need to delete the documents between two dates, before re-inserting the documents with corrections.

But according to the doc, delete commands on Time Series collections have the following limitation:

The query may only match on metaField field values.

I am new to MongoDB so I’m probably missing something here, but I do not understand this limitation:

  1. The timeField being the main index for a time series, why can’t it be used to match the documents?
  2. Do I misunderstand what Time Series collections are meant for?
  3. How can I delete the documents between two dates? Right now, I’m thinking of adding the document timestamp in metaField, but this duplication feels completely unnecessary and I wonder how performance will be affected.

In case it helps to answer my question, this is how I wanted to do this operation, using the following Go code:

	filter := bson.D{{
		Key: "timestamp",
		Value: bson.D{
			{
				Key:   "$gte",
				Value: timestamp_start,
			},
			{
				Key:   "$lt",
				Value: timestamp_end,
			},
		},
	}}
	result, err := coll.DeleteMany(context.TODO(), filter)

This fails with the following error: Cannot perform an update or delete on a time-series collection when querying on a field that is not the metaField.

@x8X5HXVGK thanks for the question!

I’m a bit fuzzy on the best practices of time series collections in MongoDB, but I think your intuition about adding the document timestamp to the metaField affecting performance is correct. My understanding is that MongoDB uses the values in the metaField to create data “buckets” that are optimized for querying across large and variable time ranges. If you have too many distinct values in the metaField, your buckets will become tiny and performance will probably suffer.

It might be possible to put a low-granularity time value in the metaField, maybe a “day stamp”, that would let you delete and replace buckets of data without reloading the entire collection. I highly recommend testing the performance impact of adding any timestamp value to the metaField before reyling on it, though.

Hi @Matt_Dale, thank you for your response.

With your response, I think I understand better how Time Series collections work. My use case is very similar to the example in the doc about the stock data. Using this example, let’s say I have a Time Series collection containing the prices of various assets, where each asset is identified by its ticker symbol. The metaField for this collection would contain that ticker symbol.

Now, if it is discovered that prices for one asset are incorrect over a period of time, say last week, there is no way to delete or update just the data from that period, I would need to delete and re-insert the entire data for this asset?

Your idea to add a less granular value of the time as part of the metaField would help, but metaField is defined when the collection is created, and cannot be modified later on, so I can’t use this now.

This limitation of not being able to use the timeField for a delete operation on a Time Series is very surprising.

I’ve ran into the same problem, im deleting all the info concerning the symbol that contains duplicates or corrupted data and then re-inserting the modified version. :man_shrugging:t3: