How to write aggregate query for this structure in MongoDB?

This structure is complex for me because of “children”. I can not write aggregate query. Here is my collection:

{
        "_id" : ObjectId("6213ba90a013b7c5f1232e1f"),
        "birthDate" : ISODate("1965-01-04T23:30:15Z"),
        "surname" : "White",
        "name" : "Adam",
        "registerDate" : ISODate("2018-04-13T08:50:30Z"),
        "children" : [
                {
                        "surname" : "White",
                        "name" : "Belly",
                        "birthDate" : ISODate("1985-01-21T09:30:15Z")
                },
                {
                        "surname" : "White",
                        "name" : "Michael",
                        "birthDate" : ISODate("1987-02-17T11:30:15Z")
                }
        ],
        "city" : "Chicago"
}
{
        "_id" : ObjectId("6213ba9fa013b7c5f1232e21"),
        "birthDate" : ISODate("1969-03-01T13:30:15Z"),
        "surname" : "Dash",
        "name" : "Levon",
        "registerDate" : ISODate("2018-04-06T15:30:15Z"),
        "children" : [
                {
                        "birthDate" : ISODate("1989-03-21T15:30:15Z"),
                        "surname" : "Dash",
                        "name" : "Alexander"
                },
                {
                        "birthDate" : ISODate("1991-04-21T17:30:15Z"),
                        "surname" : "Dash",
                        "name" : "Katy",
                        "registerDate" : ISODate("2019-08-18T13:30:15Z")
                }
        ],
        "city" : "Chicago"
}
{
        "_id" : ObjectId("6213baa5a013b7c5f1232e23"),
        "birthDate" : ISODate("1973-05-01T13:30:15Z"),
        "surname" : "Johnson",
        "name" : "Emma",
        "registerDate" : ISODate("2018-06-11T17:30:15Z"),
        "children" : [
                {
                        "birthDate" : ISODate("1993-05-21T16:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Liam"
                },
                {
                        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Olivia",
                        "registerDate" : ISODate("2019-09-14T12:30:15Z")
                }
        ],
        "city" : "Houston"
}

I want to write aggragate query that calculates average registration age of members that registered to our system based on their cities. How can I write that query?

Hi @Pinkman - Welcome to the community.

Thanks for providing a few sample documents.

I want to write aggragate query that calculates average registration age of members that registered to our system based on their cities.

Just to clarify, do you want just the average age for the members not including their children’s ages? I only ask this because you mentioned the children field as well which I do not believe is part of the average age of the registered members.

In saying so, see the below example that should hopefully assist with achieving the average age of registered members:

db.collection.aggregate([
  { 
  	$addFields:
    	{age: {$dateDiff: {startDate: "$birthDate", endDate: "$$NOW", unit: "year"}}}
  },
  {
  	$group:
  		{
  		_id:"$city", 
  		averageAge: {$avg: "$age"}
  		}
  }
])

Please note that $dateDiff is new to MongoDB Version 5.0. If you’re not using MongoDB 5.0 series, I encourage you to upgrade to the 5.0 series for new features like $dateDiff, improvements, and bugfixes.

The output of the above in mongosh is:

[
  { _id: 'Houston', averageAge: 49 },
  { _id: 'Chicago', averageAge: 55 }
]

Hope the above helps.

If you still require further assistance, please:

  • Advise the MongoDB version in use
  • Confirm if you just require the average age of the registered members (not including children ages)
  • Provide an example of expected output
  • Provide any further context around the use case

Regards,
Jason

2 Likes

Thank you @Jason_Tran - First of all, I want to calculate average “registration age”, not average age. Some children are members and some are not. I want to calculate average “registration age” based on their cities. It includes all persons that have registerDate. Can you help me about that?

Hi @Pinkman,

Thank you for clarifying however I am still a bit confused to what the "registration age"is. Would the “registration age” you have mentioned be the time between now and the "registerDate" field value?

Using an example, if a member has a "registerDate" of ISODate("2021-01-01T00:00:00Z") and today is ISODate("2022-01-01T00:00:00Z") would the “registration age” be 1 year in this example?

Some children are members and some are not.

Is each document in the collection considered a unique member?

If you could also provide your expected output with some examples, it will narrow down the best possible solution.

Regards,
Jason

The calculation you gave is completely correct. “Some children are members and some are not.” tells some children don’t have registerDate. I mean, I have to make a calculation to include the children who have registerDate. Each document is a unique member.

Thanks for confirming.

Each document is a unique member.

I will use the following document as an example to try clarifying one last uncertainty I have:

{
        "_id" : ObjectId("6213baa5a013b7c5f1232e23"),
        "birthDate" : ISODate("1973-05-01T13:30:15Z"),
        "surname" : "Johnson",
        "name" : "Emma",
        "registerDate" : ISODate("2018-06-11T17:30:15Z"),
        "children" : [
                {
                        "birthDate" : ISODate("1993-05-21T16:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Liam"
                },
                {
                        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Olivia",
                        "registerDate" : ISODate("2019-09-14T12:30:15Z")
                }
        ],
        "city" : "Houston"
}

If each document is a unique registered member, would the child “Olivia Johnson” who has a registerDate field exist in its own document? i.e. another document in the same collection in addition to the above document:

{
        "_id" : ObjectId("6213baa5a013b7c5f1232e24"), /// <--- altered the _id just as an example
        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
        "surname" : "Johnson",
        "name" : "Olivia",
        "registerDate" : ISODate("2019-09-14T12:30:15Z"),
        "city" : "Houston"
}

Regards,
Jason

The child “Olivia Johnson” who has a registerDate field only exist in its own document. It cannot be found in any other document.

It cannot be found in any other document.

I am a still a bit confused here as the child “Olivia Johnson” can be found in another document - as an element within the children array for the “Emma Johnson” document:

{
        "_id" : ObjectId("6213baa5a013b7c5f1232e23"),
        "birthDate" : ISODate("1973-05-01T13:30:15Z"),
        "surname" : "Johnson",
        "name" : "Emma",
        "registerDate" : ISODate("2018-06-11T17:30:15Z"),
        "children" : [
                {
                        "birthDate" : ISODate("1993-05-21T16:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Liam"
                },
                {
                        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Olivia",
                        "registerDate" : ISODate("2019-09-14T12:30:15Z")
                }
        ],
        "city" : "Houston"
}

However, if the child does exist in it’s own document in the collection you may be able to run the following for average registration age:

db.collection.aggregate([
  { 
  	$addFields:
    	{registrationAge: {$dateDiff: {startDate: "$registerDate", endDate: "$$NOW", unit: "year"}}}
  },
  {
  	$group:
  		{
  		_id:"$city", 
  		avgRegistrationAge: {$avg: "$registrationAge"}
  		}
  }
])

Regards,
Jason

I’m a little embarrassed for bothering you. But this last method of yours doesn’t calculate Olivia’s registration age. In other words, “registrationAge” is not created for Olivia.

I’ve tried this:
https://mongoplayground.net/p/6_vf-QMcz9p
Now I have Emma’s and Olivia’s registration ages. However, Emma is duplicate because of $unwind. I have to get rid of one and get average of Emma’s and Olivia’s registration ages. I mean,

(5+1)/2=6

You can see these values(5, 1) at link added.

Thanks @Pinkman, I believe I have a better understanding of the document(s) in the collection now.

Unfortunately this is going to be particularly tricky. I am not sure how this could be done. Perhaps another community member may have some suggestions.

However, is there a reason you have not separate the registered children as their own individual document that does not reside in the children array of their parent’s document? If the collection was to contain a separate unique document for each individual registered member then my previous example would work. You could still perhaps have the children array as it is but the question is more so regarding why the registered member (who is a child) doesn’t exist as it’s own separate document in the collection. i.e:

/// Document 1
{
        "_id" : ObjectId("6213baa5a013b7c5f1232e23"),
        "birthDate" : ISODate("1973-05-01T13:30:15Z"),
        "surname" : "Johnson",
        "name" : "Emma",
        "registerDate" : ISODate("2018-06-11T17:30:15Z"),
        "children" : [
                {
                        "birthDate" : ISODate("1993-05-21T16:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Liam"
                },
                {
                        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
                        "surname" : "Johnson",
                        "name" : "Olivia",
                        "registerDate" : ISODate("2019-09-14T12:30:15Z")
                }
        ],
        "city" : "Houston"
}
/// Document 2 (Registered Member & Child of Registered Member "Emma Johnson" in the document above
{
        "_id" : ObjectId("6213baa5a013b7c5f1232e24"), /// <--- different _id from the above document
        "birthDate" : ISODate("1994-01-21T15:30:15Z"),
        "surname" : "Johnson",
        "name" : "Olivia",
        "registerDate" : ISODate("2019-09-14T12:30:15Z"),
        "city" : "Houston"
}

The issue with the current format is that there is a mix of data which will play into augmenting the true average as you have noted. Additionally, if a child (or multiple children) have their parents both registered members, then this would add to the altering of the true average as there would be duplicates included in the count and averaging.

There may be a reason why it is structured in this manner but I would require the use case and further context regarding the aggregation to better understand. Additionally, could you provide what you are expecting the output to be?

Lastly, I would recommend going over the Building with Patterns: A Summary blog post & Model Tree Structures with Parent References documentation which you may find useful.

Regards,
Jason

Thank you very much for your explanations. Now I understand what you mean. Actually, I can convert children’s object into a new document, like in your example but I don’t know how to do it. In other words, changing the data structure doesn’t matter if I get the correct result.
My expected output looks like:

[
  {
    "_id": "Dallas",
    "avgRegistrationAge": 5
  },
  {
    "_id": "Chicago",
    "avgRegistrationAge": 3.4
  },
  {
    "_id": "Houston",
    "avgRegistrationAge": 4.5
  }
]

Here is my data pipeline

Hi @Pinkman,

I did some testing but please let me know if the following achieves what you are after:

db.collection.aggregate([
/// Filter out elements in the children array that have a children.registerDate value not equal to null
  {
    $addFields: {
      filteredArray: {
        $filter: {
          input: "$children",
          as: "child",
          cond: {
            $ne: [
              "$$child.registerDate",
             null
            ]
          }
        }
      }
    }
  },
///  Add the parent registerDate value to the new array `filteredArray` created above
  {
    $addFields: {
      newArray: {
        "$concatArrays": [
          "$filteredArray",
          [
            {
              registerDate: "$registerDate"
            }
          ]
        ]
      }
    }
  },
/// Go through each element of the `filteredArray` and return the registration age
  {
    $addFields: {
      ageArray: {
        "$map": {
          "input": "$newArray",
          "as": "newArray",
          "in": {
            "$dateDiff": {
              "startDate": "$$newArray.registerDate",
              "endDate": "$$NOW",
              unit: "year"
            }
          }
        }
      }
    }
  },
/// Unwind the `ageAray`
  {
    "$unwind": "$ageArray"
  },
/// Group on city and average out registration age
  {
    "$group": {
      "_id": "$city",
      "averageRegisterAge": {
        "$avg": "$ageArray"
      }
    }
  }
])

However, even if the above does work, I would recommend altering the schema to contain a document for each member to simplify the aggregation significantly.

Regards,
Jason

1 Like

This method is completely correct. Thank you a lot.

1 Like

Glad this helped. Just as an additional note, to increase the precision regarding age, you may wish to use a unit value of month in the $dateDiff operator.

You could then divide the average by 12 (12 months in a year) to get an average yearly value.

This is because of the behaviour of $dateDiff where durations are measured by counting the number of times a unit boundary is passed.

However, the above should be a good starting point.

Regards,
Jason

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.