$lookup accross 2 collections for ID exists takes too much time

Chalk_Marker · November 18, 2024, 12:08pm

I am using a Quartz + Spring batch Cluster to read documents from Mongo for bulk processing. Since i cannot tag(add a read flag) the original document as read , i add the ID of the read document into a migration collection and compare the ID’s across the collection using a $lookup with below code in aggregate pipeline

$lookup:{
from:'migration_coll',
localField: '_id',
foreignField: '_id,
pipeline:[
{
   $project: {
      "_id":1
   }
 }
],
as:'migrtedDocuments'
]}

I typed in the above part to get an idea on how i am using the pipeline to do a look up using _id’s across the collection and then projecting on id only as well to increase the speed. However with large collection size, the Query is really slow. With 2 million plus records it is taking more than 10 to 15s to return a count.

Questions:

is there any better way to do this ?.
What else can i use to keep a tab if i cant modify the existing document?

I am kind of stuck on this issue. Any help is appreciated