Query to return one document with array attribute aggregated from different documents

Hi,

I have a collection that contains multiple documents, like this:

[{
  "id" : 1,
  "name": "test",
  "phase": "2",
  "grades" : [
	 { "grade" : 80, "mean" : 75, "std" : 6 },
	 { "grade" : 85, "mean" : 90, "std" : 4 }
  ]
},{
  "id" : 2,
  "name": "test",
  "phase": "2",
  "grades" : [
	 { "grade" : 90, "mean" : 75, "std" : 6 },
	 { "grade" : 87, "mean" : 90, "std" : 3 },
	 { "grade" : 91, "mean" : 85, "std" : 4 }
  ]
}]

Is there a way to return one document with the array (grades) aggregated from all documents? as following:

[{
  "id" : 1,
  "name": "test",
  "phase": "2",
  "grades" : [
	 { "grade" : 80, "mean" : 75, "std" : 6 },
	 { "grade" : 85, "mean" : 90, "std" : 4 }
	 { "grade" : 90, "mean" : 75, "std" : 6 },
	 { "grade" : 87, "mean" : 90, "std" : 3 },
	 { "grade" : 91, "mean" : 85, "std" : 4 }
  ]
}]

The returned document has the shared attributes between all documents.

I used unwind and replaceRoot, as following:

db.collection.aggregate([{
    $unwind: "$grades"
  },{
    $match: {
      "grades.grade": {
        $gte: 90
      }
    }
  },{
    $replaceRoot: {
      newRoot: "$grades"
    }
  }])

but it returned the array of grades only, I need to put also the document information in the result in addition to the array that holds all elements of arrays from all documents.

Thanks

in your sample input docs name and phase are the same but id is different and you only have id:1 in your sample result. so it is not clear what you want to do with id. what about other documents where name and phase are different?

All attributes are the same in all documents, except the id. The id in returned document can be the id of the first document, or any other document.
I’m interest in returning one document, with the array value aggregated from all documents.

Are you looking for something like the following?

db.grades.aggregate([{
    $unwind: "$grades"
}, {
    $match: {
        "grades.grade": {
            $gte: 90
        }
    }
}, {
    $group: {
        _id: {
            name: "$name",
            phase: "$phase"
        },
        grades: {
            $push: "$grades"
        }
    }
}, {
    $project: {
        name: "$_id.name",
        phase: "$_id.phase",
        grades: "$grades",
        _id: 0
    }
}])

Which returns the following output based on the two sample documents you provided:

[
  {
    name: 'test',
    phase: '2',
    grades: [
      { grade: 90, mean: 75, std: 6 },
      { grade: 91, mean: 85, std: 4 }
    ]
  }
]

NOTE: Your sample query filters out grades less than 90, but your sample output showed them. I left the query the same so this will need to be tweaked to suit your needs.

Should you have various name/phase combinations in your data set and you don’t perform a $match for a certain combination, this will return all combinations as follows:

[
  {
    name: 'test',
    phase: '3',
    grades: [
      { grade: 96, mean: 95, std: 1 },
      { grade: 95, mean: 92, std: 2 }
    ]
  },
  {
    name: 'test',
    phase: '2',
    grades: [
      { grade: 90, mean: 75, std: 6 },
      { grade: 91, mean: 85, std: 4 }
    ]
  },
  {
    name: 'test',
    phase: '4',
    grades: [ { grade: 99, mean: 92, std: 4 } ]
  }
]

Let us know if this is what you’re looking for, or if I missed what you were asking. There are undoubtedly other ways to do this as well, but I just went the easy route and tweaked what you had started with.

Yes, this is exactly what I’m looking for.
Thanks a lot Doug!

Actually I’m newbie to mongodb, and I have question about the performance of the query. Do you think this grouping and projections wont impact the performance of the query in case I have 500K elements in the grades array distributed among multiple documents, and of course I will use pagination in returning data from grades array in chunks?

Thanks

As with any query, you will want to test things to see how it impacts the system and how performant it is. You will want to filter out as many documents as you can early in the process so you’re sending the least amount of data through the pipeline as possible.

Not knowing how your data is distributed, and how many elements are in each grades array, it’s hard to make any assumptions about performance. Add on top of that, the hardware and resources available also have impact on how the query will run. Another thing to take into account is how often to you expect this query to run? If it’s infrequent, then it could be a little slower and not cause much in the way of problem for day-to-day operations. However if it’s running frequently you will definitely want to make sure you get it as performant as possible.

Here is another version that will do the same thing for you. This one filters out any documents that don’t have a grade of 90 or greater in the grades array and should be able to take advantage of an index on grades.grade.

db.grades.aggregate(
    [{
            $match: {
                "grades.grade": {
                    "$gte": 90
                }
            }
        },
        {
            $addFields: {
                grades: {
                    $filter: {
                        input: "$grades",
                        as: "grade",
                        cond: {
                            $gte: ["$$grade.grade", 90]
                        }
                    }
                },
            }
        },
        {
            $unwind: "$grades"
        },
        {
            $group: {
                _id: {
                    name: "$name",
                    phase: "$phase"
                },
                grades: {
                    $push: "$grades"
                }
            }
        }
    ])

I haven’t take the time to build up a larger dataset for testing to see which might be better, but even if I did, that would just be with dummy data that may not even be close to what you have in reality. Take some time to play with both queries. Tweak as necessary. Look at the results of running an aggregate query with the explain option and look at the results to see you can get things to run optimally.

Being new to a technology can be frustrating at times because there’s so much to learn and you don’t always know where to start, but at the same time you have the enjoyment of learning new things and the sense of accomplishment that brings when you figure something out.

The documentation is a great place to learn. If you have yet come across it yet, MongoDB University offers free online courses and they have both admin geared courses and those that are geared to developers. And always feel free to stop by here to ask questions. The community is great and lots of people around helping out our fellow travelers on the path to MongoDB mastery.

2 Likes

Very helpful. Thank you very much for your insights Doug!

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.