I am using a Quartz + Spring batch Cluster to read documents from Mongo for bulk processing. Since i cannot tag(add a read flag) the original document as read , i add the ID of the read document into a migration collection and compare the ID’s across the collection using a $lookup with below code in aggregate pipeline
$lookup:{
from:'migration_coll',
localField: '_id',
foreignField: '_id,
pipeline:[
{
$project: {
"_id":1
}
}
],
as:'migrtedDocuments'
]}
I typed in the above part to get an idea on how i am using the pipeline to do a look up using _id’s across the collection and then projecting on id only as well to increase the speed. However with large collection size, the Query is really slow. With 2 million plus records it is taking more than 10 to 15s to return a count.
Questions:
- is there any better way to do this ?.
- What else can i use to keep a tab if i cant modify the existing document?
I am kind of stuck on this issue. Any help is appreciated