How to lookup and merge documents based on a nested array of ObjectIds

Hi,

I’m still getting to grips with the aggregation pipeline and need a bit of help understanding how I would piece together the following data:

I have a collection of ‘users’ that has a nested document called ‘roles’ within it. A user can belong to one or many ‘schools’ and with each school have a different set of ‘roles’. Here’s how I’ve structured the data:

User:

name,
email,
schools: [
  {
     roles: ['admin', 'user'],
     schoolId: ObjectId(...)
  }
]

A school looks like this:

_id,
name,
address
etc etc

What I would like to be able to do is produce a result that looks like this whenever I lookup a user:

name,
email,
schools: [
  {
     roles: ['admin', 'user'],
     school: {
        _id: ObjectId,
       name,
       ...all other fields
     }
  }
]

Looking at the aggregation pipeline it seems like $lookup is my first port of call using something like this:

{
  from: "schools",
  as: "schools",
   'let': { 'schoolId': '$roles.schoolId' },
      'pipeline': [
        { '$match': { '$expr': { '$in': ['$_id', '$$schoolId'] } } }
      ],
      'as':'schools'
}

Which, if I’m understanding correctly is great and gives me a ‘schools’ set of documents at the root level. I’m a bit stuck on how I get these documents to merge with the actual roles segment though. I’m also unclear about the need for a $match to check for existence.

I’d really appreciate any corrections/examples and comments.

J

Thanks for linking. Here is a quick response:

  • in this format, you may add roles to the pipeline within let and later, maybe, use something like addField. you can shape the resulting document inside the pipeline. this inner pipeline is fully disconnected from the main document except for variables you define with let and some other default ones. so think this way: how would you form a pipeline to get the fields you want if you work on only the “schools” collection?
  • localField and foreignField will ease that matching so you would not need let and pipeline to get all fields but handling outside fields might not be possible as you are replacing the document you lookup from.

There may come many different solutions with different performance levels as you have seen in the other post you followed.

since you are at the learning stage, I will leave the solving part to you for the moment. it is best for learning when you get hints yet keep trying yourself.

3 Likes

Hello @James_N_A2 ,

I agree with @Yilmaz_Durmaz that learning by doing is the best in the long term. Just to give you an idea on how I would solve this, please see the below query.

db.users.aggregate([
  {
    $unwind: "$schools"
  },
  {
    $lookup: {
      from: "schools",
      localField: "schools.schoolId",
      foreignField: "_id",
      as: "schools.school"
    }
  },
  {
    $unwind: "$schools.school"
  },
  {
    $group: {
      _id: "$_id",
      name: { $first: "$name" },
      email: { $first: "$email" },
      schools: {
        $push: {
          roles: "$schools.roles",
          school: "$schools.school"
        }
      }
    }
  }
])

Below aggregation pipelines stages are used in the above query:

  • $unwind - Deconstructs an array field from the input documents to output a document for each element

  • $lookup - Performs a left outer join to a collection in the same database to filter in documents from the “joined” collection for processing.

  • $group - It separates documents into groups according to a “group key”. The output is one document for each unique group key.

  • $push - The $push operator appends a specified value to an array.

Note: Please test the aggregation pipeline throughly as per your requirements.

Also, note that this may not be the best solution, just something I quickly come up with. You’re welcome to post your own solution that may be useful for future reader of this topic.

Regards,
Tarun

2 Likes

Okay, this has been fun so far and thanks for your hints. I’ve spent much time today looking through other examples and documentation. Most examples converge around the idea of breaking this up into 2-3 stages with some variations that I’ve learned.

First step:

Perform a $lookup that matches your ids to the related collection document ids:

{
  $inspect: {
    from: "schools",
    foreignField: "_id",
    localField: "schools.schoolId",
    as: "related_schools"
  }
}

The next step is to use $addFields with $map to work through the set of schools. I’m referring to the same user.schools segment I will augment. This appears to work the same as .map in JavaScript albeit very declarative:

{
  $addFields:  {
    "schools": {
      "$map": {
        "input": "$schools",
        "as": "schools",
        "in": {
          "$mergeObjects": [
            "$schools",
            { $arrayElemAt: [ "$related_schools", { "$indexOfArray": ["$related_schools._id", "$schools.schoolId"] } ] }
          ]
        }
      }
    }
  }
}

I couldn’t get this working without the as parameter, but it does give me a useful $$ reference. Perhaps I could have used $$this? Please let me know if there are any ways around this or better approaches. I couldn’t reference anything other than examples that use $reduce instead of $map.

So far, so good. I’ve got the structure I need and need to remove the temporary $related_schools array I created during the $lookup stage. I do this using $project and also take the opportunity to remove the now useless schoolId:

{ 
  $project: {
    "schools.schoolId": 0,
    "related_schools": 0
  }
}

All good, but I can’t help but wonder if I’m duplicating the workload by having to use $map. The ‘join’ has already been done by $lookup, which happens to be in another segment. I lose my roles segment if I use the same name. Perhaps there’s a way to store it in advance? I think that’s possibly what you alluded to with let.

Anyway, this is a good start, and it’s given me a good, basic understanding of the aggregation stuff in Mongo.

to test ideas fast on test data, you can use Mongo playground.
the “fast” is the emphasis here. I suggest this site whenever possible :slight_smile:

play with the examples for a while so to find out how to include your example data. then create some queries and get fast results.