Best way to remove duplicates for millions of records

Hi Team,

We’re trying to find duplicates methods in our collection with about 2 or 3 millions of documents, we have tried this way but this seems slow and cannot be ideal when deploying to production, do you know what’s the best and fastest way to delete duplicate records with million of data.

Sample code we have:

db.mycollection.aggregate([
{"$match":{
  "create_date_audit":{
      $gte: ISODate('2022-07-25T18:27:56.084+00:00'),
      $lte: ISODate('2022-07-26T20:15:50.561+00:00')
  }
}},
{"$sort":{
  _id: -1
}},
{"$group":{
  _id: {
    notification_id: '$notifId',
    empId: '$empId',
    date: '$date'
  },
  dups: {
    $push: '$_id'
  },
  creationTimestamp: {
    $push: '$create_date'
  },
  count: {
    $sum: 1
  }
}},
{"$match":{
  _id: {
    $ne: null
  },
  count: {
    $gt: 1
  }
}},
{"$sort":{
  create_date: -1
}},
], { allowDiskUse: true }).forEach(function(doc) {    
    db.mycollection.deleteMany({_id : {doc.dups[0]});  
})```

Dealing with duplicates is always a problem. So please do not post duplicate post about the same issue.