Group multi-dimensional array after unwinding elements

My Json:

[
   {
      "_id":{
         "$oid":"60f5c60fbc43cb00965ac641"
      },
      "groupBy":{
         "$oid":"60f5c60fbc43cb00965ac641"
      },
      "employer":{
         "$oid":"60d0e4001a6ccd764cb26df4"
      },
      "parameters":{
         "begin":"2020-01-01",
         "end":"2020-12-31"
      },
      "recordCount":74,
      "errorFields":[
         {
            "field":"employeeStatus",
            "recordCount":62,
            "errorDetails":[
               {
                  "recordIds":[
                     "10000986",
                     "10000990",
                     "10001020"
                  ],
                  "detail":"Active employees should not have term date. Termed employees should have term date.",
                  "recordCount":3
               },
               {
                  "recordIds":[
                     "10001032"
                  ],
                  "detail":"Stale pay period data (no new check for over 30 days from queried end date)",
                  "recordCount":1
               }
            ]
         },
         {
            "field":"ytdGrossWages.ytdTotalGrossWages",
            "recordCount":8,
            "errorDetails":[
               {
                  "recordIds":[
                     "10001211",
                     "10001269",
                     "10001328",
                     "10001395"
                  ],
                  "detail":"YTD total does not equal sum of YTD wage buckets",
                  "recordCount":4
               }
            ]
         }
      ],
      "timestamp":{
         "$date":"2021-07-19T18:35:59.031Z"
      },
      "__v":0
   },
   {
      "_id":{
         "$oid":"60f5c615bc43cb00965ac647"
      },
      "groupBy":{
         "$oid":"60f5c60fbc43cb00965ac641"
      },
      "employer":{
         "$oid":"60d0e4001a6ccd764cb26df4"
      },
      "parameters":{
         "begin":"2020-01-01",
         "end":"2020-12-31"
      },
      "recordCount":11,
      "errorFields":[
         {
            "field":"employeeStatus",
            "recordCount":11,
            "errorDetails":[
               {
                  "recordIds":[
                     "10003644",
                     "10003680"
                  ],
                  "detail":"Active employees should not have term date. Termed employees should have term date.",
                  "recordCount":2
               },
               {
                  "recordIds":[
                     "10003667",
                     "10003694",
                     "10003807",
                     "10003789"
                  ],
                  "detail":"Stale pay period data (no new check for over 30 days from queried end date)",
                  "recordCount":4
               }
            ]
         },
         {
            "field":"ssn",
            "recordCount":2,
            "errorDetails":[
               {
                  "recordIds":[
                     "10003667"
                  ],
                  "detail":"The ssn field is required.",
                  "recordCount":1
               },
               {
                  "recordIds":[
                     "10003694"
                  ],
                  "detail":"The ssn must be 9 digits.",
                  "recordCount":1
               }
            ]
         },
         {
            "field":"employeeHomeAddressCountry",
            "recordCount":1,
            "errorDetails":[
               {
                  "recordIds":[
                     "10003694"
                  ],
                  "detail":"The employeeHomeAddressCountry field is required.",
                  "recordCount":1
               }
            ]
         }
      ],
      "timestamp":{
         "$date":"2021-07-19T18:36:05.135Z"
      },
      "__v":0
   }
]

I want output like below

{
    "_id" : ObjectId("60f5c60fbc43cb00965ac641"),
    "errorFields" : [
            {
                    "field" : "employeeStatus",
                    "recordCount" : 73,
                    "errorDetails" : [
                            {
                                    "recordIds" : [
                                            "10001032",
                                            "10003667",
                                            "10003694",
                                            "10003807",
                                            "10003789"
                                    ],
                                    "detail" : "Stale pay period data (no new check for over 30 days from queried end date)",
                                    "recordCount" : 5
                            },
                            {
                                    "recordIds" : [
                                            "10000986",
                                            "10000990",
                                            "10001020",
                                            "10001031",
                                            "10001035"
                                    ],
                                    "detail" : "Active employees should not have term date. Termed employees should have term date.",
                                    "recordCount" : 5
                            }
                    ]
            },
            {
                    "field" : "ytdGrossWages.ytdTotalGrossWages",
                    "recordCount" : 8,
                    "errorDetails" : [
                            {
                                    "recordIds" : [
                                            "10001211",
                                            "10001269",
                                            "10001328",
                                            "10001395"
                                    ],
                                    "detail" : "YTD total does not equal sum of YTD wage buckets",
                                    "recordCount" : 8
                            }
                    ]
            },
            {
                    "field" : "ssn",
                    "recordCount" : 2,
                    "errorDetails" : [
                            {
                                    "recordIds" : [
                                            "10003667"
                                    ],
                                    "detail" : "The ssn field is required.",
                                    "recordCount" : 1
                            },
                            {
                                    "recordIds" : [
                                            "10003694"
                                    ],
                                    "detail" : "The ssn must be 9 digits.",
                                    "recordCount" : 1
                            }
                    ]
            },
            {
            "field":"employeeHomeAddressCountry",
            "recordCount":1,
            "errorDetails":[
               {
                  "recordIds":[
                     "10003694"
                  ],
                  "detail":"The employeeHomeAddressCountry field is required.",
                  "recordCount":1
               }
            ]
         }
    ]

Here is the my code

db.collection.aggregate([
  { $match: { groupBy: ObjectId("60f5c60fbc43cb00965ac641") } },
  { "$project": { 
    "_id": "$groupBy",  
    "errorFields": { "$map": {
      "input": "$errorFields",
      "as": "ef",
      "in": {  
        "field": "$ef.field",
        "recordCount": {
            $sum:"$ef.recordCount"
        },
        "errorDetails": { "$map": {
          "input": "$ef.errorDetails",
          "as": "ed",
          "in": {
            "detail": "$ed.detail",
            "recordIds": { "$map": {
              "input": "$ed.recordIds",
              "as": "ri",
              "in": {
                  $concat: [ "$ri"]
              }
            }},
            "recordCount": {
                $size:"$ed.recordIds"
            }
          }
        }}
      }    
    }}
  }}
]).pretty()

I am unable to do the group by Using groupBy key, if the filed value matches push the data inside the filed if details matches another groupBy document push the same into one and sum the recordCount.

You say you want to group things by groupBy field but you aren’t actually doing any $group stages. You also want to group things that are inside an array, that usually means you need to $unwind the array before $group. You’ll need to group first by groupBy and errorField and then just by groupBy.

Something like this seems to be close to what you want:

db.collection.aggregate([
{ $match: { groupBy: ObjectId("60f5c60fbc43cb00965ac641") } },
{$unwind:"$errorFields"},
{$group:{   
    _id:{g:"$groupBy", errorField:"$errorFields.field"}, 
    count:{$sum:"$errorFields.recordCount"}, 
    errorDetails:{$push:"$errorFields.errorDetails"}
}}, 
{$group:{
    _id:"$_id.g", 
    errorFields:{$push:{
        field:"$_id.errorField", 
        recordCount:"$count", 
        errorDetails:{$reduce:{
            input:"$errorDetails",
            initialValue:[],
            in:{$concatArrays:["$$value",  "$$this"]}}}
        }}
    }}
])

That last $reduce is because pushing an array of errorDetails into an array in previous $group stage created an array of arrays, where you just want a simple array. Unfortunately we don’t have a “push all array values” accumulator so we flatten the array of arrays in the last stage.

Asya

1 Like

Thank you for the response @Asya_Kamsky, I think one more $group is required, Inside the errorDetails, if errorFields.errorDetails.field matches then needs to $group and push the recordIds for matching fields.

I think one more $group is required

I don’t think so. I ran this pipeline on your sample data and I got exactly your sample desired output.

Because I unwind errorFields and group by errorFields.field only matching field names will be together.

if errorFields.errorDetails.field

I didn’t see field under errorDetails so perhaps your sample data doesn’t quite match your real data?

Asya

1 Like

Sorry @Asya_Kamsky, errorFields.errorDetails.detail

I’m not sure I understand how things should be grouped further, but I think you should be able to play with the pipeline I provided on your actual data and make appropriate modifications to get to the final result you need. :slight_smile:

Asya

Thank you @Asya_Kamsky