Lookup collection and aggregation

Hi folks I have two collections:

Tags and Pages.

A tag document is defined as:

{
  "_id": "6369b55019ddb435c190065a",
  "name": "Tips",
  "counter": 10,
  "last_updated": {
    "$date": {
      "$numberLong": "1667917950452"
    }
  },
  "type": "generic"
}

And a Page is defined as:

{
  "_id": "2f980777-3e45-4710-9f9f-231c6b79275e",
  "type": "ads",
  "default": "ads",

  "index": 0,
  "last_updated": {
    "$date": {
      "$numberLong": "1669497124258"
    }
  },
  "magazine_id": "63487e6a-52c7-4215-a6fe-50082a7f830b",
  "merge_group": false,
  "number": 50,
  "page_group": [
    46,
    47,
    48,
    49,
    50
  ],
  "publication_id": "360cea8c-5837-45cf-908c-1276b1abd9ab",
  "reviewed": true,
 
  "tags": [
    {
      "_id": "61c89821-be46-47c5-8e32-3f71c05c863b",
      "counter": 0,
      "last_updated": {
        "$date": {
          "$numberLong": "1669402611000"
        }
      },
      "name": "Poll",
      "type": "generic"
    }
  ],
  "issue_date": {
    "$date": {
      "$numberLong": "620611200000"
    }
  }
}

So counting the top tags has not been a problem as I just do the aggregation on the collection. However how do I find also tags that exist in Tags but not in Pages. Ideally I would like to get a result with the missing Tags counted as 0.

Thank you

There must be something I do not understand because it sounds like a very simple pipeline with a lookup and a match should provide you with what you are looking for. Something like the untested:

lookup = { "$lookup" : {
    "from" : "Pages" ,
    "as" : "_result" ,
    "localField" : "name" ,
    "foreignField" : "tags.name"
    "pipeline" : [ { "$limit" : 1 } ]
} }

match = { "$match" : {
  "_result" : { "$size" : 0 }
} }

db.Tags.aggregate( [ lookup , match ] )

Hi Steve, I’m sorry if I was not clear. I want something like an left join in SQL would do:

Tag.name.  |  count
Tips           |    187
Poll            |      23
Reviews    |        0
Profiles.    |.        0

There are elements in Tags not present in Pages, and I need to count both presence and absence. When I ran the aggregation only on Pages, I get only the present tags not the missing ones, does it make sense?

Thanks

I managed to get this to work:

[{
 $match: {
  type: 'generic'
 }
}, {
 $lookup: {
  from: 'Pages',
  localField: '_id',
  foreignField: 'tags._id',
  as: '_result'
 }
}, {
 $addFields: {
  tagsCount: {
   $size: '$_result'
  }
 }
}, {
 $sort: {
  tagsCount: 1
 }
}]
1 Like

Your pipeline is essentially the same as mine with the following differences.

An extra $match at the beginning for type:generic which was not mentioned in your original post.

Field names in localField and foreignField that seems wrong compared to your sample documents. The localField:_id seems to be an ObjectId while foreignField:tags._id seems to be UUID. They could not match. I used localField:name and foreignField:tags.name because they were the only that could be matched.

My $limit:1 and final $match wrongly assumed that you only wanted the unused tags.

2 Likes

Thanks for your input, helped a lot, I just documented the result in case someone else finds this.

Cheers

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.