Group aggregate two times on different field

Hazim_Ali · November 17, 2022, 6:40pm

Hye,
I want to ask if is this possible to run group two times and make the structure like this.

here is my sample of my documents

documents
[
  {
    users: ['A','B'],
    speaker: 'C'
  },
  {
    users: ['A','B'],
    speaker: 'C'
  },
  {
    users: ['D'],
    speaker: 'C'
  },
  {
    users: ['E','F'],
    speaker: 'G'
  },
  {
    users: ['H'],
    speaker: 'C'
  },
  {
    users: ['A','B'],
    speaker: 'C'
  },
  {
    users: ['E','F'],
    speaker: 'G'
  }
]

and the end goals to get structure like this

[
  {
    _id: 'C',
    class: [
      {
        _id: ['A','B'],
        sessions: [
          {
            {
              users: ['A','B'],
              speaker: 'C'
            },
            {
              users: ['A','B'],
              speaker: 'C'
            },
            {
              users: ['A','B'],
              speaker: 'C'
            }
          }
        ]
      },
      {
        _id: ['D'],
        sessions: [
          {
            users: ['D'],
            speaker: 'C'
          }
        ]
      },
      {
        _id: ['C'],
        sessions: [
          {
            users: ['H'],
            speaker: 'C'
          }
        ]
      }
    ]
  },
  {
    _id: 'G',
    class: [
      {
        _id: ['E','F'],
        sessions: [
          {
            users: ['E','F'],
            speaker: 'G'
          },
          {
            users: ['E','F'],
            speaker: 'G'
          }
        ]
      }
    ]
  }
]

for first layers group by speaker I am able to do it. but the second layer to push and group by users I am stuck there. hope someone able to help. thanks!

Jason_Tran · November 25, 2022, 12:51am

Hi @Hazim_Ali - Welcome to the community.

I’ve written a test aggregation pipeline that i’ve only tested on the sample documents provided. There is an assumption that the "speaker" field’s values are single characters only ranging from A - Z so this may or may not work for you depending on your data / use case.

Pipeline:

[
  {
    '$group': {
      _id: { speaker: '$speaker', _id: '$users' },
      sessions: { '$push': { users: '$users', speaker: '$speaker' } }
    }
  },
  {
    '$bucket': {
      groupBy: '$_id.speaker',
      boundaries: [
        'A', 'B', 'C',
        'D', 'E', 'F',
        'G', 'H'
      ], // <-- Depending on how 'speaker' is initially valued. This assumes it's only a single character and not an array of characters.
      default: 'other',
      output: { classInitial: { '$push': '$sessions' } }
    }
  },
  {
    '$addFields': {
      class: {
        '$map': {
          input: '$classInitial',
          in: {
            _id: { '$arrayElemAt': [ '$$this', 0 ] },
            sessions: '$$this'
          }
        }
      }
    }
  },
  { '$project': { _id: 1, class: 1 } }
]

Note: you can increase the boundaries array depending on your use case. I stopped at 'H' just as an example

I’ve used a $project stage at the end to get as close as possible to your desired output but I would recommend running each stage 1 by 1 to see what the output field is at each stage.

Output:

[
  {
    _id: 'C',
    class: [
      {
        _id: { users: [ 'A', 'B' ], speaker: 'C' },
        sessions: [
          { users: [ 'A', 'B' ], speaker: 'C' },
          { users: [ 'A', 'B' ], speaker: 'C' },
          { users: [ 'A', 'B' ], speaker: 'C' }
        ]
      },
      {
        _id: { users: [ 'D' ], speaker: 'C' },
        sessions: [ { users: [ 'D' ], speaker: 'C' } ]
      },
      {
        _id: { users: [ 'H' ], speaker: 'C' },
        sessions: [ { users: [ 'H' ], speaker: 'C' } ]
      }
    ]
  },
  {
    _id: 'G',
    class: [
      {
        _id: { users: [ 'E', 'F' ], speaker: 'G' },
        sessions: [
          { users: [ 'E', 'F' ], speaker: 'G' },
          { users: [ 'E', 'F' ], speaker: 'G' }
        ]
      }
    ]
  }
]

The main difference that I have noticed between the above output and your desired output is that the "class._id" value contains the "speaker" value as well all contained inside an object.

Although this may work or there may be improved suggestions in terms of an aggregation to get your desired output, I am curious to understand the use case for this kind of output or if you have considered perhaps doing some post-processing of the data after it is retrieved from the database.

Please lastly take into consideration that I have only briefly tested this against the sample data provided and if you believe it could possibly work for you then please test thoroughly to verify it suits all your use case(s) and requirement(s).

Hope this helps.

Regards,
Jason

system · January 24, 2023, 12:51am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.