Best way to do $group aggregate summing array property

Hello everyone! I’m new to using MongoDB, and I’m trying to create an aggregation query where I need to sum some values inside an array.

Here’s what my document looks like:

documentExample = {
id: ObjectId,
name: string,
myArrayElement: {
isSuccess: boolean,
severity: string,
date: Date
}
}

Hello everyone! I’m new to using MongoDB, and I’m trying to create an aggregation query where I need to sum some values inside an array.

Here’s what my document looks like:

documentExample = {
  id: ObjectId,
  name: string,
  myArrayElement: {
    isSuccess: boolean,
    severity: string,
    date: Date
  }[]
}

What I want to achieve is a query that filters all documents containing an element inside ‘myArrayElement’ that matches a certain date or any other conditions. I also want to get the following counts:

  1. The total number of items where the property ‘isSuccess’ is true.
  2. The total number of items where ‘isSuccess’ is false.
  3. The count of items where ‘severity’ is equal to ‘success’.
  4. The count of items where ‘severity’ is equal to ‘error’.

What is the best way to construct this query, considering that the database might contain a large amount of data? Does anyone have any insights? Thanks.

It would be nice if you could share some sample documents and the expected results. Having to create our own documents in order to experiment is way more tedious that simply using the one you share.

Hello. Sure, this is what my document looks like:

{
  "_id": {
    "$oid": "6534743d813c22c7e43282dd"
  },
  "name": "Jhon Doe",
  "inclusionDate": {
    "$date": "2023-06-22T03:57:48.576Z"
  },
  "isActive": true,
  "integrationHistory": [
    {
      "isSuccess": false,
      "integrationDate": {
        "$date": "2020-11-01T00:00:00.000Z"
      },
      "statusSentToClient": true,
      "severity": "warning"
    },
    {
      "isSuccess": false,
      "integrationDate": {
        "$date": "2020-12-01T00:00:00.000Z"
      },
      "statusSentToClient": true,
      "severity": "error"
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-01-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-02-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-03-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-04-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-05-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-06-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    },
    {
      "isSuccess": true,
      "integrationDate": {
        "$date": "2021-07-01T00:00:00.000Z"
      },
      "statusSentToClient": true
    }
  ]
}

My collection contains a lot of documents, and the process runs once every month, saving the status of the integration in integrationHistory . What I need to do is filter by date and obtain some indicators of the integration process by month.

What I want is a query that filters all documents matching the "integrationHistory.integrationDate" with a specific date. As a result, I want to achieve something like this:

{
  "countSuccessItems":  number, // Number of items where "integrationHistory.isSuccess" is true
  "countItemsWithError": number, // Number of items where "integrationHistory.isSuccess" is false
  "countStatusSentToClient": number, // Number of items where "integrationHistory.statusSentToClient" is true
  "countSeverityWarning": number,  // Number of items where "integrationHistory.severity" is 'warning'
  "countSeverityError": number  // Number of items where "integrationHistory.severity" is 'error'
}

There is many ways to achieve that. The following seems to the simplest and easiest to develop, test and understand.

1 - It first uses $project to weed out the fields that are not needed for the use-case.
2 - Then we simply $unwind to make things easier to work with.
3 - Next is a $facet with 3 paths.
3.a - One $facet path for the isSuccess field
3.b - One $facet path for the severity field
3.c - One last $facet path for the statusSentToClient
4 - 2 $project stages to produce the results in the desired format. It could probably be done in one.

The nice thing about $facet is you can develop and test each path individually before putting them together in the $facet stage.

pipeline = [ ] ;

project_history = { "$project" : {
	"integrationHistory" : 1 ,
	"_id" : 0
} } ;

pipeline.push( project_history ) ;

unwind_history = { "$unwind" : "$integrationHistory" } ;

pipeline.push( unwind_history ) ;

match_date = { "$match" : {
	"integrationDate" : date_to_match 
} } ;

pipeline.push( match_date ) ;

success_facet = { "$group" : {
	"_id" : "$integrationHistory.isSuccess" ,
	"count" : { "$sum" : 1 }
} } ;

severity_facet = { "$group" : {
	"_id" : "$integrationHistory.severity" ,
	"count" : { "$sum" : 1 }
} } ;

sent_facet = { "$group" : {
	"_id" : "$integrationHistory.statusSentToClient" ,
	"count" : { "$sum" : 1 }
} } ;

facet = { "$facet" : {
	"success" : [ success_facet ],
	"severity" : [ severity_facet ],
	"sent" : [ sent_facet ]
} } ;

pipeline.push( facet ) ;

/* For filter_results and project_results, I only did it for isSuccess field
   as the other fields follow the same pattern. */
filter_results = { "$project" : {
	"countSuccessItems" : { "$arrayElemAt" : [ { "$filter" : {
		"input" : "$success" ,
		"cond" : { "$eq" : [ true , "$$this._id" ] }
	} } , 0 ] } ,
	"countItemsWithError" : { "$arrayElemAt" : [ { "$filter" : {
		"input" : "$success" ,
		"cond" : { "$eq" : [ false , "$$this._id" ] }
	} } , 0 ] } ,
} }

pipeline.push( filter_results ) ;

project_results = { "$project" : {
	"countSuccessItems" : "$countSuccessItems.count"  ,
	"countItemsWithError" : "$countItemsWithError.count" 
} }

pipeline.push( project_results ) ;
1 Like

Thank you, it worked.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.