How to Aggregate by time interval with secondary dimension

I have a database of NFT mint events, where each event contains an array of transactions with a ‘count’ and a ‘function’ among other fields. For example, the data looks as follows (simplified):

MintEvent = {
  address: "0xnnnnn",
  minter: "0xnnnnn",
  date: "2022-08-20T16:51:48.554+00:00"
  transactions: [
    {
      hash: "0xnnn",
      count: 10,
      function: "mint"
    },
    {
      hash: "0xnnn",
      count: 5,
      function: "claim"
    },
    {
      hash: "0xnnn",
      count: 7,
      function: "claim"
    }
  ],
  mintTotal: 25
}

I want to graph the mint data over time using a stacked bar chart, so for every 15 minute interval I want the total number of mints for each ‘function’ name within that 15 minute period.

I can’t figure out how to do this using an aggregate pipeline. I’ve seen examples of how to use $dateToParts with a group to count the number of mints within each time period, but I don’t want to return one value for each period. I want to return the total number of mints for each function name (there can be any number of functions, all with different names and many with the same name). To add a further level of complexity, the function names are held int a different collection and it is a function sig in the mints array. I know how to use a $project to resolve that.

Where to do I need to be looking to answer this please?

Thanks

1 Like

Hi @Stephen_Eddy,

Is the date field a string in your database?

I believe you will need to unwind the transactions array and use a new stage called $densify (available in 6.0) to create empty gaps for empty 15 min ranges:

If you believe you have at least one value every 15 min you might not need it,

Then you will have to group each transactions.function value based on a $dateTrunc with unit of minutes and binSize of 15 :

In the operator use a $sum of the “count” field.

Ty
Pavel

Thanks

Thanks for your answer Pavel.

However, I don’t think I can unwind the transactions array can I? There can be multiple entries in this array so how can they unwind to a single object? When I have tried, but the unwind operation only keeps the first transaction array entry and the rest are lost.

There are situations where there will not be an entries in the time period, so it sounds like $densify will be required. Is it possible for you to give a code example of your explanation?

Despite the example showing the same ‘hash’ for subdocuments, in reality every subdocument will have a different hash value.

I’ve also tried using a Group stage with addToSet and dateTrunc, but addToSet is adding duplicate ‘functionSig’ values.

Just to clarify the mention of functionSig. In the example in the first question I used the field name ‘function’ to simplify. In the real dataset it is actually called functionSig, but they are one and the same.

Hi @Stephen_Eddy ,

I based it on your initial document example and used MintEvent as the collection name:

db.MintEvent.aggregate([{
    $unwind: '$transactions'
   },{ $densify : {
    field: "date",
    range: {
      step: 15,
      unit: "minute",
      bounds: [ISODate("2022-08-20T16:00:00.000Z"), ISODate("2022-08-20T17:00:00.000Z")]
    }}
  },{
    $group: {
     _id: {
      date: {
       $dateTrunc: {
        date: '$date',
        unit: 'minute',
        binSize: 15
       }
      },
      'function': '$transactions.function'
     },
     functionCount: {
      $sum: {$cond : [{$eq : ['$transactions.function', null]},0,"$transactions.count"]}
     }
    }
   }]).toArray()

I unwind and then densify every 15min of the hour (16:00 -17:00)

Then I sum based on the truncation of 15min and the function name that was unwinded…

Thanks
Pavel

Fantastic. Thank you for your help Pavel.

What was confusing me was the unwind. I didn’t fully understand that it was duplicating the parent for each copy of the child. It makes perfect sense now.