How to match by a condition and group by common field?

I wonder if it it possible to group documents based on a common field and a condition.
In the following collection:

[
  {
    "pid": "102",
    "e_date": ISODate("2017-04-01T00:00:00.000+00:00"),
    "h_val": 4,
    
  },
  {
    "pid": "102",
    "e_date": ISODate("2005-04-01T00:00:00.000+00:00"),
    "h_val": 5,
    
  },
  {
    "pid": "102",
    "e_date_1": ISODate("2017-05-01T00:00:00.000+00:00"),
    "s_val": 87,
    "d_val": 58
  },
  {
    "pid": "102",
    "e_date_1": ISODate("2016-09-01T00:00:00.000+00:00"),
    "s_val": 81,
    "d_val": 62
  },
  {
    "pid": "102",
    "e_date_1": ISODate("2010-09-01T00:00:00.000+00:00"),
    "s_val": 81,
    "d_val": 62
  },
  {
    "pid": "101",
    "e_date": ISODate("2016-04-01T00:00:00.000+00:00"),
    "h_val": 5,
    
  },
  {
    "pid": "101",
    "e_date_1": ISODate("2011-05-01T00:00:00.000+00:00"),
    "s_val": 87,
    "d_val": 58
  },  
]

I am trying to group documents based on pid and check if e_date_1 is within 1 year of e_date. I have tried match and group aggregates:

db.collection.aggregate([
  {
    "$match": {
      $expr: {
        $lte: [
          {
            $abs: {
              $dateDiff: {
                startDate: "$e_date",
                endDate: "$e_date_1",
                unit: "year"
              }
            }
          },
          1
        ]
      }
    }
  },
  {
    "$group": {
      "_id": "$pid",
      "actions": {
        "$push": "$$ROOT"
      },
      "total": {
        "$sum": 1
      }
    }
  }
])

But I get all documents grouped by pid not the ones that are within 1 year.

Hello @WilJoe!

Based on your example data, none of the documents have both an e_date and a e_date_1 field.

So the result of the $dateDiff will be null … which happens to be less than or equal to 1 … so it is matching all your documents.

Is that example data complete?

For debugging (or even perhaps as part of your final pipeline) you could add these as fields, to make it easier to see what they are … using $addFields something like:

{ 
  "$addFields": {
    "yearDiff": {
       "$dateDiff": {
         "startDate": "$e_date",
         "endDate": "$e_date_1",
         "unit": "year"
        }
    },
    "isWithinYear": { "$lte": ["$yearDiff", 1]}
  }
}
1 Like

Hi Justin,
Thanks for checking this out. Yes, this is a sample of my collection. The point is comparing e_date and e_date_1 which are coming from different sources and group them by pid. I am trying to find the temporal relationship between the documents, that is why I have chosen different names for the date fields.

I’m not really understanding your exact problem @WilJoe? Your query looks like it is probably okay.

If the documents had both fields … so $dateDiff had something to compare, you’d get different results.

Given your example data you’d get back 2 documents:

  • _id: "102" with a total of 5
  • _id: "101" with a total of 2

Which are your 7 documents grouped by the pid since the $match isn’t doing anything.

If you make sure at least one of the documents has both e_date and e_date_1 and those dates are more than a year apart you’d get a different total and different documents inside actions.

I am trying to first find the ones with same pid then create (possibly multiple) groups for the ones that are within 1 year of each other:
The output should be like:

[
  {
    "pid": "102",
    "e_date": ISODate("2017-04-01T00:00:00.000+00:00"),
    "h_val": 4,    
  },
{
    "pid": "102",
    "e_date_1": ISODate("2017-05-01T00:00:00.000+00:00"),
    "s_val": 87,
    "d_val": 58
  },
{
    "pid": "102",
    "e_date_1": ISODate("2016-09-01T00:00:00.000+00:00"),
    "s_val": 81,
    "d_val": 62
  }
]