How to calculate the average of fields whose keys start with a string

emman · February 15, 2022, 11:01am

I’m very new to MongoDB and am trying to figure out how to calculate the average of certain fields through the the node-js driver.

Here is a minimal toy example to demonstrate my data. This is one record out of many:

[
    {
        "_id": 123,
        "createdAt": "2021-12-25",
        "data": {
            "dem_num_age": {"value": 25},
            "dem_num_height": {"value": 100},
            "dem_num_weight": {"value": 160},
            "dem_num_n_of_kids": {"value": 0},
            "dem_cat_fam_status": {"value": "married"},
            "preferred_pet": {"value": "dog"},
            "preferred_color": {"value": "purple"},
            "preferred_movie": {"value": "titanic"}
        }
    },
    {...} // another record
]

And I want to query the DB to get the average (across records) of each field inside data{} that starts with dem_num. Thus, the desired output of the query should be a javascript object with those averages:

{
    "dem_num_age": 35.2,  
    "dem_num_height": 123.7,
    "dem_num_weight": 188.5,
    "dem_num_n_of_kids": 1.5
}

Any idea how I could do that in node-js?

Many thanks!

Pavel_Duchovny · February 15, 2022, 6:17pm

Hi @emman ,

Not sure why you used an embedded document just to point “value” to an actual value instead of just having the values:

"data": {
            "dem_num_age":  25,
            "dem_num_height": 100,
            ...
            "preferred_movie": "titanic"
        }

Your life with aggregation will be so much easier.

Ok now the way to do the avg calculation might be a bit comlex since as far as I understand you don’t know all the field names up front. If you do now the field names then use a $project + ,$filter to remove any fields without the needed string . Then do an $unwind + $group + $avg on each of the fields on $data path.

If you don’t know the names of the fields you will need to play with $objectToArray operators to locate the field names as key and calculated the correct “v” avg.

If this is not clear I can try to provide an example later on

Thanks
Pavel

emman · February 16, 2022, 8:01am

Hi @Pavel_Duchovny , thank you for responding.

I absolutely agree that the structure is not optimal. Unfortunately, I wasn’t the one who set up this DB, and therefore it is a given situation.

Yes, that is correct. I only know the prefix of interest of those field names.

I will be so grateful if you could provide an example of such syntax. I’m very clueless and this seems like a nasty challenge for a newcomer.

Thanks a lot!

Pavel_Duchovny · February 16, 2022, 10:46am

Hi @emman ,

So asI mentioned the aggregation is not super easy and you should try and see if you can filter or stop at an early stage to optimise it for you.

Considering the given structure, here is an example of my aggregation:

db.collection.aggregate([{$project: {
 // Make the data as a k, v of field names and values and regex on the needed starting pattern
 data: {
  $filter: {
   input: {
    $objectToArray: '$data'
   },
   as: 'dataItem',
   cond: {
    $regexMatch: {
     input: '$$dataItem.k',
     regex: RegExp('^dem_num')
    }
   }
  }
 }
}},
// Unwind  so every item is an object
{$unwind: {
 path: '$data'
}}, 
// group based on the keys and avg
{$group: {
 _id: '$data.k',
 avg: {
  $avg: '$data.v.value'
 }
}},
// Create a new key and value set to morph for object
 {$group: {
 _id: null,
 root: {
  $push: {
   k: '$_id',
   v: '$avg'
  }
 }
}},
// Create the new object
 {$replaceRoot: {
 newRoot: {
  $arrayToObject: '$root'
 }
}}])

The Output is for my test objects:

{ dem_num_weight: 230,
  dem_num_age: 25,
  dem_num_n_of_kids: 0,
  dem_num_height: 300 }

Thanks