Create a field in an array of objects, in an aggregation

Hello,

I have a collection with the following schema:

{
  "myArray": [
    {
      "id": "1",
      "data_a": 50,
      "data_b": 50
    },
    {
      "id": "2",
      "data_a": 100,
      "data_b": 200
    }
  ],
}

Using an aggregation, my goal is to obtain the following result:

{
  "myArray": [
    {
      "id": "1",
      "data_a": 50,
	  "data_b": 50,
	  "max": 50
    },
    {
      "id": "2",
      "data_a": 100,
	  "data_b": 200,
	  "max": 200
    }
  ],
}

When trying the following aggregation:

[
  {
    $addFields: {
      "myArray.max": {
        $max: [
          "$myArray.data_a",
          "$myArray.data_b",
        ],
      },
    },
  },
]

I get the following result:

{
  "interestedBusinessUnits": [
    {
      "id": "1",
      "data_a": 50,
      "data_b": 50,
      "max": [50, 200]
    },
    {
      "id": "2",
      "data_a": 100,
      "data_b": 200,
      "max": [50, 200]
    }
  ],
}

I tried different techniques but can’t get to the desired result. Please assist!

$max is an accumulator operator, so must be used within a $group stage (there is a non-aggregation $max for running updates as well)

So an easy way is to unwind the arrays, then work out the max for each item and then re-combine.

As an example of an approach:

db.getCollection("Test").aggregate([
{
    $unwind:'$myArray'
},
{
    $addFields:{
        'myArray.maxval':{
            $cond:{
                if:{$lt:['$myArray.data_b', '$myArray.data_a']},
                then:'$myArray.data_a',
                else:'$myArray.data_b'
            }
        }
    }
},
{
    $group:{
        _id:'$_id',
        myArray:{$push:'$myArray'}
    }
}

])

I’m sure there are other ways of doing this, but this is fairly simple, watch out for performance with a lot of data and $unwinding.

Hello @John_Sewell,

Thank you very much for your prompt answer. The actual collection is very large, so I wish to avoid that method: using $unwind and then $group is very costly performance-wise.

By the way, using $max or $cond actually gave me the same output within $addFields stage, even though $max is an accumulator. Interesting!

My approach would be to use $map with $mergeObjects.

Something along the untested code:

map = { "$map" : {
    "input" : "$interestedBusinessUnits" ,
    "as" : "unit" ,
    "in" : { "$mergeObjects" : [
        "$$unit" ,
        { "max" : { "$max" : [
            "$$unit.data_a" ,
            "$$unit.data_b"
        ] } }
    ] }
} }
addFields = { "$addFields" : {
    "_result" : map
} }
pipeline = [ addFields ]
2 Likes

Hello @steevej,

Thank you very much! Once more, your brains save me! In case it would help someone else, I rewrite your solution as a pipeline, as this was my initial request:

  {
    $addFields: {
      myArray: {
        $map: {
          input: "$myArray",
          in: {
            $mergeObjects: [
              "$$this",
              {
                max: {
                  $max: [
                    "$$this.data_a",
                    "$$this.data_b",
                  ],
                },
              },
            ],
          },
        },
      },
    },
  },
1 Like

Please note that my solution is also pipeline. You simply run it with:

db.your_collection.aggregate( pipeline )

It is written using variables because it is easier to develop, to read, to edit and to understand. Just like when you write a program using multiple small functions rather than having a monolithic main function that is indented past the middle of screen and where you have to scroll to see the end of a block.

I even find the map object/variable quite big. I should have wrote instead:

mergeObject = { "$mergeObjects" : [
    "$$unit" 
     { "max" : { "$max" : [
        "$$unit.data_a"
        "$$unit.data_b"
    ] } }
] }
map = { "$map" : {
    "input" : "$interestedBusinessUnits" ,
    "as" : "unit" ,
    "in" : mergeObjects
} }

This way syntax errors are easier to correct because you edit a much smaller chunk of code.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.