Get periodic nearest documents from a start date for a specific time interval

Hi,

I am using mongodb 3.6 and I got a collection of documents which are created every seconds or so.

From a starting date, and for a specific time interval (startDate and endDate) i would like to get the nearest document for every 15min intervals based on its creationDate.

start date T
document 1: nearest of (T + 15min)
document 2: nearest of (T + 30min)
etc…

“Start date” could be any date with time (second is the precision).
I am ok for filtering startDate and endDate with $gt or $lt operators.
I am not sure how to design the query regarding the “nearest” document because I can not do an exact matching for (T + 15min).

Any help would be appreciate !
Thx

Structure of a document:

{
   "_id": ObjectId(""),
   "creationDate'': ""
}

Hello @gil_gougnot, Welcome to the MongoDB Community forum!

You can use this example query to get the desired result. The query gets document at every 15 mins, if there is no match at the exact 15 mins it will get the next document, i.e., with the next highest date (the assumption that is the “nearest” one). Note the checking happens at exact 15 mins intervals.

The query tests fine in mongo shell. And, all the operators used in the query are MongoDB v3.6 compatible.

Sample document:

{ 
    "_id" : ObjectId("622ae04d13cdd737c8a86f7f"), 
    "dt" : ISODate("2020-07-05T00:14:02.051Z") 
}

The query:

var MINS_15 = 900000    // in milli seconds (15 * 60 * 1000)

var pipeline = [

    // Sort all the documents by the date in ascending order
    { 
        $sort: { dt: 1 } 
    },

    // Collect all the sorted documents into an array 'docs'.
    // Collect all the dates without the time seconds fractions in array 'dts' -
    // so the date comparison happens at seconds values.
    { 
        $group: { 
             _id: null, 
             dts: { 
                 $push: {
                     $dateFromParts : {
                         year: { $year: "$dt" }, month: { $month: "$dt" }, day: { $dayOfMonth: "$dt" },
                         hour: { $hour: "$dt" }, minute: { $minute: "$dt" }, second: { $second: "$dt" }
                     }
             } },
             docs: { $push: "$$ROOT" },
    }},

    // Accumulate the documents for every 15 mins (or next document, if not exact match).
    // Iterate over the 'dts' array, and for every 15 mins (or greater) -
    // collect the document in an array 'reduced_value'.result'. This is achieved using the
    // $reduce array operator.
    { 
        $addFields: { 
            reduced_value: { 
                $reduce: { 
                    input: "$dts", 
                    initialValue: { 
                        result: [ { $arrayElemAt: [ "$docs", 0 ] } ], 
                        nxt_dt: { $add: [ { $arrayElemAt: [ "$dts", 0 ] }, MINS_15 ] },
                        idx: 0
                    },
                    in: {
                        $cond: [ 
                            { $gte: [ "$$this", "$$value.nxt_dt" ] }, 
                            { 
                                result: { $concatArrays: [ "$$value.result", [ { $arrayElemAt: [ "$docs", "$$value.idx" ] } ] ] },
                                nxt_dt: { $add: [ "$$value.nxt_dt", MINS_15 ] },
                                idx: { $add: [ "$$value.idx", 1 ] }
                            },
                            { $mergeObjects: [ "$$value", { idx: { $add: [ "$$value.idx", 1 ] } } ] }
                        ]
                    }
                }
            }
    }},

    // Project the result documents
    { 
        $project: { _id: 0, result: "$reduced_value.result" }
    }

]

// Execute the query
db.collection.aggregate(pipeline)

You can add a $match stage at the beginning of the pipeline, and specify the start and end dates:

var START_DATE = ISODate("2020-07-05T00:00:00Z")
var END_DATE = ISODate("2020-07-05T00:31:00Z")

{ 
    $match: { 
        dt: { $gte: START_DATE, $lt: END_DATE }
}}