Hi everyone,
I’m working on an aggregation pipeline where I need to join two collections using $lookup, and then immediately filter on a field from the joined collection using $match. Here’s a simplified version of my pipeline:
db.collectionA.aggregate([ { $lookup: { from: "collectionB", localField: "foreign_key", foreignField: "_id", as: "joined_docs" } }, { $unwind: "$joined_docs" }, { $match: { "joined_docs.some_field": "some_value" } } ])
Problem
Despite having proper indexes on collection A and B, the performance is extremely poor when running the aggregation. The $lookup step seems to be working, but the $match on joined_docs.some_field is not using the index on collectionB.
From what I understand, the $lookup stage does a left outer join and brings the documents into memory, so the $match afterward happens on in-memory objects, not allowing the index on some_field to be used.
Questions
- Is it expected that the index on
some_fieldfromcollectionBis not used after$lookup? - Is there a better way to perform this kind of query efficiently?
- Should I consider rewriting the logic as two separate queries? ( considering to get the foreign_key from collection B to then perform matching stage on collection A without the need for $lookup stage on all the collection
Any insights, best practices, or guidance would be really appreciated!
Thanks in advance!