Hi,
I have a task to get data from collection which have millions of records and need to support pagination. THe issue with that is the collection also contains duplicate data and we don’t need that to come up in result. I can use group but that is making the query very slow. For example the limit is 2000 than using group after using limit always gives result less that 2000. and i I first use group than limit than the query is grouping all data in collection and than giving first 2000 records. The issue with this approach is currently I have more than million data in db so the group operation is very slow. can any one help.
for example:-
Collection_A
all fields:-
_id | firstname | lastname | userid | email | purchseOrder | purchaseUnit | contact number | …
Data example
-
_id | john | doe | johndoe | johndoe@gmail.com | 12345 | 78 | +1 (123)- (123) - 7896 | …
-
_id | john | doe | johndoe | johndoe@gmail.com | 89076 | 800 | +1 (123)- (123) - 7896 | …
So in this collection one user can have multiple purchase order , and I only want this user details only one time… This collection have more than million data
SO currently I am using this aggregation:-
db.Collection_A.aggregate([
{
$match : {
DELETED : { $ne: 'N'},
}
},{
$group: {
_id: {
uderid : '$userid',
email: '#email'
},
distinct: {$first: '$$ROOT'}
}
},{
$replaceRoot: {newRoot: '$distinct'}
},{
$skip : 0
},{
$limit: 100
}
])
This query is
vey slow as I have million of records and I need to support pagination