Complex Embedded Arrays of Objects $min/$max

Ryan_Youngsma · June 15, 2023, 11:43pm

Hello,

I have a complex data structure and having an extremely difficult time with an aggregation query to get the result I was hoping for.

Here is a sample of my data structure, I had to really obscure this due to sensitive content reasons. I will have hundreds of thousands of documents in this collection.

What I am looking to do is try and retrieve the lowest value_b and the highest value_b from element

Working with MongoDB Community 6.0

{
  "data": {
    "array_of_data": [
      {
        "element": {
          "elementData": [
            { "value_a": 1, "value_b": 200},
            { "value_a": 2, "value_b": 2500 }
          ]
        }
      },
      {
        "element": {
            "elementData": [
              { "value_a": 1, "value_b": 150},
              { "value_a": 2, "value_b": 5600 }
            ]
          }
      }
    ]
  }
}

Looking for a result something like the following

{
    "min": 150,
    "max": 5600
}

I have tried quite a few options so far, but here is latest with the closest that I have gotten

[{
    $project: {
     'data.array_of_data.element.elementData.value_b': 1
    }
   }, {
    $group: {
     _id: '$data.array_of_data.element.elementData.value_b'
    }
   }, {
    $addFields: {
     minA: {
      $min: '$_id'
     },
     maxA: {
      $max: '$_id'
     }
    }
   }, {
    $addFields: {
     low: {
      $min: '$minA'
     },
     high: {
      $max: '$maxA'
     }
    }
   }, {
    $project: {
     low: 1
    }
   }]

I appreciate any suggestions that someone may have.

Jason_Tran · June 19, 2023, 12:44am

Hi @Ryan_Youngsma - Welcome to the community.

I haven’t tested this out on a larger set of data but I think the following may get you to a similar desired output:

var pipeline =
[
  { '$unwind': '$data.array_of_data' },
  { '$unwind': '$data.array_of_data.element.elementData' },
  {
    '$group': {
      _id: '$_id',
      max: { '$max': '$data.array_of_data.element.elementData.value_b' },
      min: { '$min': '$data.array_of_data.element.elementData.value_b' }
    }
  }
]

However, there are are 2 $unwinds and a $group here so you may encounter performance issues. Could I understand the use case here? Is this a workload that you would be running frequently?

You might be able to include a $match stage with index usage if possible at the start to try reduce the amount of documents being passed through the pipeline.

Regards,
Jason

Ryan_Youngsma · June 19, 2023, 10:49am

Hi @Jason_Tran

Thank you for the reply. From what I have been able to test so far I think this will work for me.

We will have a match stage at the beginning so this should bring down the resulting documents from the hundreds of thousands to thousands of documents to aggregate over. Depending on how many different match criteria a user will enter will also determine how many documents will be returned.

This is a query that would not be run very often. If I were to guess, maybe 5 - 10 times an hour.

I will try to run this on the original data source to evaluate performance as my testing environment has a pretty limited dataset.

I appreciate your assistance.

system · June 24, 2023, 10:50am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.