Return ony when a selected fields are repeated

Lets imagine I have:

CollectionA

name: “first”
mainId: a2345e87
email: “abc@gmail.com

name: “second”
mainId: a2345e87
email: “def@gmail.com

name: “third”
mainId: c2345e87
email: “ghi@gmail.com

name: “fourth”
mainId: a2345e87
email: “abc@gmail.com

We can notice that mainId=a2345e87 repeats 3 times. Rows first, second and fourth.
But only rows first and fourth have same emails.

How to retrieve only one row when this happens. (mainId + email are the same) ?

Is like a mysql GROUP BY two fields. Tried to do this using the documentation but I failed.
It returned me a single field (the one I grouped by, or demands me to use an accumulator).
I Want all fields, just want to pick a single row when two or more fields are the same.

The result I want:

name: “first”
mainId: a2345e87
email: “abc@gmail.com

name: “second”
mainId: a2345e87
email: “def@gmail.com

name: “third”
mainId: c2345e87
email: “ghi@gmail.com

I need to plug this solution inside the aggregation I already have:

  return this.aggregate([
    {
      $lookup: {
        from: 'collectionB',
        localField: 'mainId',
        foreignField: 'mainId',
        as: 'allowed',
      },
    },
    {
      $match: {
          {
            $or: [
              { 'allowed': [] }, // if collectionA.mainId doesn't exist inside collectionB
              { 'allowed.expires': { $lt: new Date(interval) } }, // if CollectionA.mainId exists inside collectionB BUT the field expires is longer than a certain interval
            ],
          },
      },
    },
  ]);

Hi @Alan,

Here is my proposition. It’s a weird pipeline but here we go :slight_smile: !

Input:

> db.coll.find()
[
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb78"),
    name: 'first',
    mainId: 'a2345e87',
    email: 'abc@gmail.com'
  },
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb79"),
    name: 'second',
    mainId: 'a2345e87',
    email: 'def@gmail.com'
  },
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb7a"),
    name: 'third',
    mainId: 'c2345e87',
    email: 'ghi@gmail.com'
  },
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb7b"),
    name: 'fourth',
    mainId: 'a2345e87',
    email: 'abc@gmail.com'
  }
]

Pipeline:

[
  {
    '$sort': {
      'name': 1
    }
  }, {
    '$group': {
      '_id': {
        'a': '$mainId', 
        'b': '$email'
      }, 
      'doc': {
        '$first': '$$ROOT'
      }
    }
  }, {
    '$replaceRoot': {
      'newRoot': '$doc'
    }
  }
]

Result:

[
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb79"),
    name: 'second',
    mainId: 'a2345e87',
    email: 'def@gmail.com'
  },
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb7a"),
    name: 'third',
    mainId: 'c2345e87',
    email: 'ghi@gmail.com'
  },
  {
    _id: ObjectId("62ab6a529c4f1c48f52ceb78"),
    name: 'first',
    mainId: 'a2345e87',
    email: 'abc@gmail.com'
  }
]

Note that depending if you sort the name in ascending or descending order, you don’t get the same result because you are retrieving the “first” doc that comes which is order dependant.

Without the $sort stage, their is no guarantee that you will always get the same result with the same set of documents as it will be dependant on the physical storage order.

Cheers,
Maxime.

2 Likes

Thank you very much!

1 Like