How can I do a LEFT OUTER JOIN in mongodb?

I have two collections:

Collection 1:
mail | value
a 2
b 3
c 4

Collection 2:

mail | value
a 2
c 4

Desired result:
mail | value
b 3

How can I do this with the aggregation pipeline? Any help would be appreciated, I am just a begginer and I tried myself but with no success…

Hi @A_ML, and welcome to the MongoDB Community forums!

In MongoDB the $lookuup operator does a left outer join of two collections.

db.coll1.aggregate([
    {
        $lookup: {
            from: "coll2",
            localField: "mail",
            foreignField: "mail",
            as: "joined"
        }
    }
])

This will give results similar to the following based on your example:

[
  {
    _id: ObjectId("633219622627d7ac551adc7f"),
    mail: 'a',
    value: 2,
    joined: [
      {
        _id: ObjectId("633219742627d7ac551adc82"),
        mail: 'a',
        value: 2
      }
    ]
  },
  {
    _id: ObjectId("633219622627d7ac551adc80"),
    mail: 'b',
    value: 3,
    joined: []
  },
  {
    _id: ObjectId("633219622627d7ac551adc81"),
    mail: 'c',
    value: 4,
    joined: [
      {
        _id: ObjectId("633219742627d7ac551adc83"),
        mail: 'c',
        value: 4
      }
    ]
  }
]

However it looks you want only those documents that exist in the left side and not the right side. You can add a $match to the above to return only where the joined field is an empty array (i.e. no matched document on the right side:

db.coll1.aggregate([
    {
        $lookup: {
            from: "coll2",
            localField: "mail",
            foreignField: "mail",
            as: "joined"
        }
    },
    {
        $match: {
            joined: []
        }
    }
])

Which returns:

[
  {
    _id: ObjectId("633219622627d7ac551adc80"),
    mail: 'b',
    value: 3,
    joined: []
  }
]

From there you can do what you need with the documents that remain in the pipeline.

Note that this works just fine with the very small sample data, but you will want to test with larger amounts of production style data and make sure to filter out any unneeded documents before performing the lookup and make sure that the necessary indexes are in place.

1 Like

Hi, thanks for your reply! I will take it into consideration. However, I would like to know how to do it taking two indexes at a time.
For example, not just looking at the column “mail”, but taking into account “mail” and “values” at the same time, since the same “mail” could have multiple records too. Is it possible with some kind of concatenation? In that case, is it possible to do it at runtime with a query?

Thanks

The documentation covers multi field joins as well. You would use a form of the $lookup similar to the following:

db.coll1.aggregate([{
        $lookup: {
            from: "coll2",
            let: {
                coll1_mail: "$mail",
                coll1_value: "$value"
            },
            pipeline: [{
                $match: {
                    $expr: {
                        $and: {
                            $eq: ["$$coll1_mail", "$mail"],
                            $eq: ["$$coll1_value", "$value"]
                        }
                    }
                }
            }],
            as: "joined"
        }
    },
    {
        $match: {
            joined: []
        }
    }
])

Hopefully the updated makes sense. Basically you create variables from the left collection fields to be used later in the pipeline. The pipeline then does the join (and could do other things as well if needed). Finally outside of the $lookup, you have you match to return only the documents that are not in the right collection.

5 Likes

Thanks for your help. Great explanation

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.