Timeseries: Modifying underlying bucket collection for easier queries

prunevac · April 20, 2022, 10:22am

I have a time series collection and sometimes I’m getting correcting values, e.g. ‘at this timestamp the value is actually 0.2 instead of 0.3’. Since updates can only be made on metafield, I’m stuck with a duplicate values and only one of them is correct. I’m getting around this by aggregation (I’m ‘throwing away’ the old values from result set base on its _id).

Simple example:
I have this collection

db.createCollection(
    'data',
    {
        timeseries: {
            timeField: 'timestamp',
            metaField: 'metadata',
            granularity: 'hours',
        },
    },
);

Then I insert a document

db.getCollection('data').insert({
    timestamp: ISODate('2022-04-20T12:00:00Z'),
    metadata: '123456',
    value: 0.2,
})

And some time in the future the correcting document

db.getCollection('data').insert({
    timestamp: ISODate('2022-04-20T12:00:00Z'),
    metadata: '123456',
    value: 0.3,
})

To get that data I use following aggregation

db.getCollection('data').aggregate([
    {
        $match: {
            metadata: '123456',
            timestamp: ISODate('2022-04-20T12:00:00Z'),
        },
    },
    {
        $group: {
            _id: '$timestamp',
            docs: {$push: '$$ROOT'},
        },
    },
    {
        $replaceRoot: {
            newRoot: {
                $reduce: {
                    input: '$docs',
                    initialValue: [
                        {
                            _id: {'$toObjectId': '000000000000000000000000'},
                        },
                    ],
                    in: {
                        $cond: {
                            if: {'$gt': ['$$this._id', '$$value._id']},
                            then: '$$this',
                            else: '$$value',
                        },
                    },
                },
            },
        },
    },
])

There are a lot more documents than these two and the match stage usually uses a date range, but I hope the example is enough for an illustration.

Now finally onto my question. Would it be possible / safe / advisable to update the value directly in the underlying bucket collection (‘system.buckets.data’ in this case)?
My thoughts on this are:

It wouldn’t change the shape of documents in bucket collection, so it should not matter
I don’t like the idea of messing with internal collections though
I think it would have just moved the problem elsewhere, i.e. it would simplify data querying at a cost of complicating data insertion as I would have to check for duplicates before inserting.

Stefan_de_Jong · May 3, 2022, 10:48am

Unfortunately I am in the same situation where I am on the brink of simply modifying the underlying bucket as well to get around mongodb’s limitations.
MongoDB 6 seems to still not support these manipulations: https://www.mongodb.com/docs/v6.0/core/timeseries/timeseries-limitations/

Benjamin_Behringer · June 17, 2022, 8:49pm

How did you proceed? We are in the same boat…

prunevac · July 14, 2022, 10:50am

I just run the aggregation I posted above for every data retrieval.