Aggregate data from master document and it's details

Hi,

I have a document with an attribute of type array, and other attributes. I applied outlier pattern, so that the array overflow data added into a new document of the same type. The result is we have master/parent document (id=1 in below dataset) contains all attributes including the attribute of type array, and the overflow documents having only the attribute of type an array (id=2 and 3 in below dataset), and parent attribute points to the id of the parent document, as following:

[{
    "id": 1,
    "phase": "phase1",
    "address": "address1",
    "grades": [{
        "grade":80, "mean": 75, "std": 6
      },{
        "grade":85, "mean": 90, "std": 4
      },{
        "grade": 91, "mean": 85, "std": 4
      }
    ]
  },
  {
    "id": 1_1,
    "parent": 1,
    "grades": [{
        "grade":90, "mean": 75, "std": 6
      },{
        "grade":87, "mean": 90, "std": 4
      },{
        "grade": 91, "mean": 85, "std": 4
      }
    ]
  },{
    "id": 1_2,
    "parent": 1,
    "grades": [{
        "grade":82, "mean": 75, "std": 6
      },{
        "grade":83, "mean": 81, "std": 4
      },{
        "grade": 99, "mean": 85, "std": 4
      }
    ]
  }
]

Is there a way to aggregate the data all documents to return grades greater than 80 and address = “address1”? as following expected result:

[{
    "id": 1,
    "phase": "phase1",
    "address": "address1",
    "grades": [{
        "grade":80, "mean": 75, "std": 6
      },{
        "grade":85, "mean": 90, "std": 4
      },{
        "grade": 91, "mean": 85, "std": 4
      },{
        "grade":90, "mean": 75, "std": 6
      },{
        "grade":87, "mean": 90, "std": 4
      },{
        "grade": 91, "mean": 85, "std": 4
      },{
        "grade":82, "mean": 75, "std": 6
      },{
        "grade":83, "mean": 81, "std": 4
      },{
        "grade": 99, "mean": 85, "std": 4
      }
    ]
  }
]

Thanks

Hi,

I started with something like this:

db.collection.aggregate([
  {
    $unwind: "$grades"
  },
  {
    $match: {
      "grades.grade": {
        $gte: 80
      },
      "address": "address1"
    }
  },
  {
    $group: {
      _id: {
        id: {
          $ifNull: [
            "$parent",
            "$id"
          ]
        }
      },
      grades: {
        $push: "$grades"
      }
    }
  }
])

But the issue is that matching on address, returns only first document. As other documents not having address attribute.
Any idea how can handle these documents as one document in aggregating data, since all have the same parent which have the information?

Thanks

you should match before unwind. match is smart enough to match within arrays. you have a higher probability to leverage your indexes if you match first

you only get the grades on the parent because you are matching address. I see 2 solutions

  1. add address to child documents
  2. do a self lookup matching parent id right after the match

you then use $filter to get the grades subdocuments

since you do not unwind you do not need to group

2 Likes

Thanks a lot Steeve for the help!

Ended up as follows:

db.collection.aggregate([
  {
    "$match": {
      "address": "address1"
    }
  },
  {
    "$project": {
      "grades": {
        "$filter": {
          "input": "$grades",
          "cond": {
            "$gte": [
              "$$this.grade",
              80
            ]
          }
        }
      }
    }
  }
])

I was trying to find a way to connect document and subdocuments based of the “id” and “parent” fields. But didn’t find such a way. As you suggested, either have to duplicate the data (address) in subdocuments. Or using lookup.
I would use duplicate data (address) in subdocuments (solution #1), as I’m under impression that the lookup is not a proper solution for large dataset from performance perspective.

Thanks

yes, lookup will be slower, but your data schema is defined in such a way that this is the solution. make sure you have proper indexes

for your lookup, use localField:id with foreignField:parent and it should work

note that when you project you loose id, phase and address fields

1 Like

Thanks a lot Steeve!

This design for the data schema was to avoid unbounded arrays, so I used outlier pattern, by created new subdocuments for overflow data in arrays. But with this approach, I had to handle all CRUD APIs to take into consideration the subdocuments.
Because of the performance impact for lookup, I wont use it. Instead, going to copy all fields needed in retrieve queries in subdocuments.

Thanks a lot

1 Like