Group documents and sum values of deeply nested arrays

Hello there!

There is a daily cronjob which creates a daily report (model name: dailyReport) per customer and its campaigns based on request and booking numbers per country & region. These values are nested in the docs and I need to summarize these values. Up front: I have no influence to the structure and model of dailyReport so I can’t change how the daily cronjob creates those.

Example doc for dailyReport:

{
  date: '2021-06-06 22:00:00.000Z',
  customerId: '559687785574',
  customerName: 'Test A Customer',
  campaigns: [ # contains an indefinite number of campaigns
    {
      _id: '62c5810cbccef7125b81870b',
      name: 'newsletter-2021',
      requestCount: 4,  # sum of countries.requestCount
      bookingCount: 2,   # sum of countries.bookingCount
      revenue: 1500,     # sum of countries.revenue
      countries: [       # contains an indefinite number of countries
        {
         countryCode: 'DE',
         requestCount: 4,  # sum of regions.requestCount
         bookingCount: 2, # sum of regions.bookingCount
         revenue: 1500,   # sum of regions.revenue
         regions: [       # contains an indefinite number of regions
          {
           countryRegion: 'Bayern',
           requestCount: 1,
           bookingCount: 1,
           revenue: 1000
          },
          {
           countryRegion: 'Sachsen',
           requestCount: 3,
           bookingCount: 1,
           revenue: 500
          }
         ]
        }
      ]
    }
  ]
}

So these docs are created for every customer for every day. Now I need to aggregate/sum the data for a single customer within a date range.

How can i aggregate the dailyReports for a single customer grouped by campaigns and sum the nested values for requestCount, bookingCount & revenue for every region, country and campaign?

I tried the following to aggregate the data for may 2021:

[
  # limit docs by customer id and requested date range
  { $match: {
    customerId: '559687785574',
    date: {
      $gt: new Date('2021-05-31 22:00:00.000Z'),
      $lt: new Date('2021-07-01 22:00:00.000Z'),
    }
  }},
  # group by customer id and push all campaign data in one array
  { $group: {
    _id: '$customerId',
    customerName: { $first: '$customerName' },
    campaigns: { $push: '$campaigns' }
  }},
# 'transform' $campaigns to single level array
  { $project: {
      _id: 1,
      customerName: 1,
      campaigns: { $reduce: {
          input: '$campaigns',
          initialValue: [],
          in: { $concatArrays: ['$value', '$this'] },
      }}
  }}
]

But now I’m stuck because every doc in $campaigns holds its own data for countries and regions, but I need to somehow sum them based on same campaign, country, region data but keep the nesting. So the final array from aggregation should look like this:

[
  # ... shorten for readability
  {
    _id: '62c5810cbccef7125b81870b',
    name: 'newsletter-2021',
    requestCount: 340,
    bookingCount: 157,
    revenue: 20358,
    countries: [
      # ... shorten for readability
      {
        countryCode: 'DE',
        requestCount: 123,
        bookingCount: 67,
        revenue: 10564,
        regions: [
          {
            countryRegion: 'Bayern',
            requestCount: 74,
            bookingCount: 37,
            revenue: 7854,
          },
          {
            countryRegion: 'Sachsen',
            requestCount: 74,
            bookingCount: 37,
            revenue: 7854,
          },
          # ... shorten for readability
        ],
        # ... shorten for readability
      },
      # ... shorten for readability
    ],
  },
  # ... shorten for readability
]

I hope I explained myself clear enough and someone can help me.

cheers
Georg

1 Like

Hi @Georg_Bote,

Thanks for providing a really detailed explanation including what you’ve attempted so far :slight_smile:

campaigns: [ # contains an indefinite number of campaigns
countries: [ # contains an indefinite number of countries
regions: [ # contains an indefinite number of regions

Regarding the "campaigns" , "countries" and "regions" array fields, you have advised these can contain an indefinite number of values / documents within. If you’re planning to add to these arrays forever, please take note of the BSON Document Size limit as it may cause issues in future if you hit this limit.

But now I’m stuck because every doc in $campaigns holds its own data for countries and regions, but I need to somehow sum them based on same campaign, country, region data but keep the nesting. So the final array from aggregation should look like this:

Based off your current pipeline and expected output, I presume you wish to add some further stages to do the summing to achieve the top level "requestCount", "bookingCount" and "revenue" values (in your expected output document). Is this correct?

I tried to use the sample document and your current pipeline but received a "campaigns" value of null.

In my test environment, I have created 2 documents for the same "customerId". The below is the pipeline used (I have removed the date range filter just for demonstration purposes for the output).

2 sample documents inside the test environment (same customerId, values within the "campaign" array differing. E.g. altered country codes):

db.campaigns.find()
[
/// Document 1
  {
    _id: ObjectId("6327c2b3b56990d3451c53cb"),
    date: '2021-06-06 22:00:00.000Z',
    customerId: '559687785574',
    customerName: 'Test A Customer',
    campaigns: [
      {
        _id: '62c5810cbccef7125b81870b',
        name: 'newsletter-2021',
        requestCount: 4,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'DE',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 1,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      }
    ]
  },
/// Document 2
  {
    _id: ObjectId("6327c2b3b56990d3451c53cc"),
    date: '2021-06-06 22:00:00.000Z',
    customerId: '559687785574',
    customerName: 'Test A Customer',
    campaigns: [
      {
        _id: '62c5810cbccef7125b81870b v2',
        name: 'newsletter-2021 v2',
        requestCount: 5,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'AU',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 2,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      },
      {
        _id: '62c5810cbccef7125b81870b v3',
        name: 'newsletter-2021 v3',
        requestCount: 6,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'US',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 3,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      }
    ]
  }
]

PIpeline stages used:

db.campaigns.aggregate(
{ $match: {customerId: '559687785574'}},
{ $unwind:"$campaigns"},
{ $group: { "_id" : '$customerId', "customerName":{"$first":'$customerName'},campaigns: { $push: '$campaigns' }}}
)

Output:

[
  {
    _id: '559687785574',
    customerName: 'Test A Customer',
    campaigns: [
      {
        _id: '62c5810cbccef7125b81870b',
        name: 'newsletter-2021',
        requestCount: 4,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'DE',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 1,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      },
      {
        _id: '62c5810cbccef7125b81870b v2',
        name: 'newsletter-2021 v2',
        requestCount: 5,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'AU',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 2,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      },
      {
        _id: '62c5810cbccef7125b81870b v3',
        name: 'newsletter-2021 v3',
        requestCount: 6,
        bookingCount: 2,
        revenue: 1500,
        countries: [
          {
            countryCode: 'US',
            requestCount: 4,
            bookingCount: 2,
            revenue: 1500,
            regions: [
              {
                countryRegion: 'Bayern',
                requestCount: 3,
                bookingCount: 1,
                revenue: 1000
              },
              {
                countryRegion: 'Sachsen',
                requestCount: 3,
                bookingCount: 1,
                revenue: 500
              }
            ]
          }
        ]
      }
    ]
  }
]

Note: The main difference between the pipeline you have provided and above is that the above uses $unwind to achieve a single level array.

I understand you have redacted some details for readability however if you require further assistance could send 3-4 sample documents for the same customerId and advise of the expected output?

Regards,
Jason

3 Likes