Join two collections by common field and return the matches

Hi,

I’m new to MongoDB and I have 2 collections, one called “EN” and another one called “csv_import”. I just need to join these 2 collections using a common field and get the results. For results, I just need the Part number and product id. The 2 collections structure is as follow:

csv_import:

product_id
part_no
vendor_standard

EN: under object “ICECAT-interface.Product”:

@Prod_id
@Name

(these are the main ones but there are other non-important fields, for the sake of this example I include only relevant ones.

just as clarification, “@” is part of the field name

I’m using this to join the two collections:

db.EN.aggregate([
 {
     $lookup: {
  from: 'csv_import',
  localField: 'ICECAT-interface.Product.@Prod_id',
  foreignField: 'part_no',
  as: 'part_number'
     }
 }]);

Unfortunately, I get an empty array in part_number when I’m expecting just the results that match. Also, how can I specify which fields I want to get back? I thought adding “as: part_number” would be enough but doesn’t seem to be the case

Here’s some collection sample (taken from “EN”)

[{
  "_id": "1414",
  "ICECAT-interface": {
    "@xmlns:xsi": "http://www.w3.org/2001/XMLSchema-instance",
    "@xsi:noNamespaceSchemaLocation": "https://data.icecat.biz/xsd/ICECAT-interface_response.xsd",
    "Product": {
      "@Code": "1",
      "@HighPic": "https://images.icecat.biz/img/norm/high/1414-HP.jpg",
      "@HighPicHeight": "400",
      "@HighPicSize": "43288",
      "@HighPicWidth": "400",
      "@ID": "1414",
      "@LowPic": "https://images.icecat.biz/img/norm/low/1414-HP.jpg",
      "@LowPicHeight": "200",
      "@LowPicSize": "17390",
      "@LowPicWidth": "200",
      "@Name": "C6614NE",
      "@IntName": "C6614NE",
      "@LocalName": "",
      "@Pic500x500": "https://images.icecat.biz/img/gallery_mediums/img_1414_medium_1480667779_072_2323.jpg",
      "@Pic500x500Height": "500",
      "@Pic500x500Size": "101045",
      "@Pic500x500Width": "500",
      "@Prod_id": "C6614NE",

Sample collection data taken from “csv_import” collection:

{
  "_id": "ObjectId(\"6348339cc6e5c8ce0b7da5a4\")",
  "index": 23679,
  "product_id": 4019734,
  "part_no": "CP-HAR-EP-ADVANCED-REN-1Y",
  "vendor_standard": "Check Point"
},

I’ve tried this:


db.EN.aggregate([
 {
     $lookup: {
  from: 'csv_import',
  let: {pn:'$ICECAT-interface.Product.@Prod_id'},
  pipeline: [{
      $match: {
          $expr: {
              $eq: ["$$pn","$part_no"]
          }
      }
  }],
  as: 'part_number_info'
     }
 }]).pretty();

but yet again, I’m not getting ONLY the matched, I get everything with empty array in part_number_info

Hello @Matias_Montroull, Welcome to the MongoDB community developer forum,

Your aggregation pipeline looks good, but there is no matching value available in your example documents which is why it results in an empty array,

{ $match: { $expr: { $eq: ["$$pn", "$part_no"] } } }
// Actual Value is Not Matching
// { $match: { $expr: { $eq: ["C6614NE", "CP-HAR-EP-ADVANCED-REN-1Y"] } } }

Let’s change the value of part_no same as @Prod_id,
EN Collection:

{
  "_id": "1414",
  "ICECAT-interface": {
    "Product": {
      "@Prod_id": "C6614NE"
    }
  }
}

csv_import Collection:

{
  "_id": "ObjectId(\"6348339cc6e5c8ce0b7da5a4\")",
  "index": 23679,
  "product_id": 4019734,
  "part_no": "C6614NE", // updated this to same as `@Prod_id`
  "vendor_standard": "Check Point"
}

Your same Query:

db.EN.aggregate([
  {
    $lookup: {
      from: "csv_import",
      let: { pn: "$ICECAT-interface.Product.@Prod_id" },
      pipeline: [
        { $match: { $expr: { $eq: ["$$pn", "$part_no" ] } } }
      ],
      as: "part_number_info"
    }
  }
])

Playground

Result:

[
  {
    "ICECAT-interface": {
      "Product": {
        "@Prod_id": "C6614NE"
      }
    },
    "_id": "1414",
    "part_number_info": [
      {
        "_id": "ObjectId(\"6348339cc6e5c8ce0b7da5a4\")",
        "index": 23679,
        "part_no": "C6614NE",
        "product_id": 4.019734e+06,
        "vendor_standard": "Check Point"
      }
    ]
  }
]

Thanks Turivishal,

I realize it’s not the same value, how can I get only the matches and not the ones where the array is empty? getting a result set of say 10.000 records that don’t match is not ideal, I just want the ones that match.

Thanks!

It requires to check the $match condition part_number_info not equal to [] empty array after $lookup stage,

db.EN.aggregate([
  {
    $lookup: {
      from: "csv_import",
      let: { pn: "$ICECAT-interface.Product.@Prod_id" },
      pipeline: [
        { $match: { $expr: { $eq: ["$$pn", "$part_no" ] } } }
      ],
      as: "part_number_info"
    }
  },
  { $match: { part_number_info: { $ne: [] } } }
])

Thanks! I’ve executed this and it takes forever now, maybe I need some additional index? Currently I have index on @Prod_id.

Running this returns in a few seconds:
{ $match: { part_number_info: { $eq: } } but when I use the $ne it becomes really slow, it takes a lot of CPU in the server

Here’s the output of the explain():


{
  "explainVersion": "1",
  "stages": [
    {
      "$cursor": {
        "queryPlanner": {
          "namespace": "Icecat.EN",
          "indexFilterSet": false,
          "parsedQuery": {},
          "queryHash": "8B3D4AB8",
          "planCacheKey": "D542626C",
          "maxIndexedOrSolutionsReached": false,
          "maxIndexedAndSolutionsReached": false,
          "maxScansToExplodeReached": false,
          "winningPlan": {
            "stage": "COLLSCAN",
            "direction": "forward"
          },
          "rejectedPlans": []
        }
      }
    },
    {
      "$lookup": {
        "from": "csv_import",
        "as": "part_number_info",
        "let": {
          "pn": "$ICECAT-interface.Product.@Prod_id"
        },
        "pipeline": [
          {
            "$match": {
              "$expr": {
                "$eq": [
                  "$$pn",
                  "$part_no"
                ]
              }
            }
          }
        ]
      }
    },
    {
      "$match": {
        "part_number_info": {
          "$not": {
            "$eq": []
          }
        }
      }
    }
  ],
  
  },
  "serverParameters": {
    "internalQueryFacetBufferSizeBytes": 104857600,
    "internalQueryFacetMaxOutputDocSizeBytes": 104857600,
    "internalLookupStageIntermediateDocumentMaxSizeBytes": 104857600,
    "internalDocumentSourceGroupMaxMemoryBytes": 104857600,
    "internalQueryMaxBlockingSortMemoryUsageBytes": 104857600,
    "internalQueryProhibitBlockingMergeOnMongoS": 0,
    "internalQueryMaxAddToSetBytes": 104857600,
    "internalDocumentSourceSetWindowFieldsMaxMemoryBytes": 104857600
  },
  "command": {
    "aggregate": "EN",
    "pipeline": [
      {
        "$lookup": {
          "from": "csv_import",
          "let": {
            "pn": "$ICECAT-interface.Product.@Prod_id"
          },
          "pipeline": [
            {
              "$match": {
                "$expr": {
                  "$eq": [
                    "$$pn",
                    "$part_no"
                  ]
                }
              }
            }
          ],
          "as": "part_number_info"
        }
      },
      {
        "$match": {
          "part_number_info": {
            "$ne": []
          }
        }
      }
    ],
    "cursor": {},
    "$db": "Icecat"
  },
  "ok": 1
}

The index will support in the first stage ($match) or in some stages also in the second stage ($match, $sort, $group),

So that $match after $lookup will not use the index, I would not suggest this aggregation pipeline if you have huge data, obviously, it becomes slow,

I would suggest you improve your schema/document structure or manage a boolean flag (has prod id?) in EN collection so you don’t need $lookup stage in your aggregation query.

I solved it by adding an index on part_no in csv_import (EN @Prod_id already had an index) and changing the lookup a bit to go from small to large collection. Here’s the final solution:

db.csv_import.aggregate([
 {
     $lookup: {
  from: 'EN',
  let: {pn:'$part_no'},
  pipeline: [{
      $match: {
          $expr: {
              $eq: ["$pn","$ICECAT-interface.Product.@Prod_id"]
          }
      }
  }],
  as: 'part_number_info'
     }
 },{$match: {"part_number_info.0": {$exists: true}}}
 ])
1 Like