The goal is to extract all metrics that does not have at least one snapshot, metrics entity being a simple json but the snapshot’s foreignField is an array:
The issue is that with only a couple thousand snapshots entities, the query takes already a couple minutes to run. The joining part is almost instant, the slowness comes from the match…
Any idea on how I could rewrite this?
‘explain’ shows a COLLSCAN from the join, so the slowness has nothing to do with the $match. Replaced it with a $sort and it’s the same execution time.
If I understand you want to the _id of some resource for which there is NO events from snapshot.
If it is the case then I have the following idea that might work.
Basically, you look at all the events which is slow but in fact you just locate one then you know that you can prune the given _id. I think you could use one of the alternate https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/ where you can set a pipeline where you would limit the $lookup only to the first occurrence of a given events.resource._id.
The only thing I could think is that since you scan the whole snapshot collection it might not fit in RAM and disk I/O becomes a bottle neck. By limit the search to the first one, the working set has better chance to fit in RAM.