Get data from collection having millions of record and also contains duplicate data using limit() and skip()

Pavel_Duchovny · November 6, 2022, 1:58pm

First create the new collection by running:

db.Collection_A.aggregate([
{
     $match : {
             DELETED : { $ne: 'N'},
      }
},{
    $group: {
        _id: {
            uderid : '$userid',
            email: '#email' 
        },
        distinct: {$first: '$$ROOT'}
    }
},{
    $replaceRoot: {newRoot: '$distinct'}
},
{
 $out : "distinct_Collection_A"
} ])

Now you can fetch the first round of documents:

db.distinct_Collection_A.find({}).sort({_id : 1}).limit(100);
db.distinct_Collection_A.find({_id : {$gt : <LAST_ID_ABOVE>}}).sort({_id : 1}).limit(100);
...

Thanks
Pavel