Aggregation to get events in timespan, plus the previous event

Heath_Volmer · February 16, 2023, 4:31pm

Hello. I have timeseries data as events coming in at random times. They are not ongoing metrics, but rather events. “This device went online.” “This device went offline.”

I need to report on the number of actual transitions within a time range. Because there are occasionally same-state events, for example two “went online” events in a row, I need to “seed” the data with the state previous to the time range. If I have one event in my time range, I need to compare it to the previous in order to determine if something actually changed.

I already have aggregation stages that remove same-state events.

Is there a way to add “the latest, previous event” to the data in the pipeline without writing two queries? A $facet stage totally ruins performance.

For “previous”, I’m currently trying something like this in a separate query, but it’s very slow on the millions of records:

// Get the latest event before a given date
db.devicemetrics.aggregate([
  {
    $match: {
      'device.someMetadata': '70b28808-da2b-4623-ad83-6cba3b20b774',
      time: {
        $lt: ISODate('2023-01-18T07:00:00.000Z'),
      },
      someValue: { $ne: null },
    },
  },
  { $sort: { time: -1 } },
  {
    $group: {
      _id: '$device._id',
      lastEvent: { $first: '$$ROOT' },
    },
  },
  {
    $replaceRoot: { newRoot: '$lastEvent' },
  }
]);

steevej · February 17, 2023, 1:26pm

You could try a $lookup after the $group to locate the previous event. The $lookup will be using a $limit 1.

It is technically a separate query but it is done within the same database access and using the same index.

Heath_Volmer · February 17, 2023, 3:04pm

Thanks! That is actually what I wound up doing yesterday! It is very slow on a large set of events.

What I’m going to do is modify how the incoming events are stored, looking up previous when a new event comes in and ignore repeated non-changing states. Then I no longer need to get the previous event to determine when something actually changed at the start of my range and things are substantially faster.