Advice and Good Practice regarding lookup/match operation

Hi everyone,

I’m working on an aggregation pipeline where I need to join two collections using $lookup, and then immediately filter on a field from the joined collection using $match. Here’s a simplified version of my pipeline:
db.collectionA.aggregate([ { $lookup: { from: "collectionB", localField: "foreign_key", foreignField: "_id", as: "joined_docs" } }, { $unwind: "$joined_docs" }, { $match: { "joined_docs.some_field": "some_value" } } ])

Problem

Despite having proper indexes on collection A and B, the performance is extremely poor when running the aggregation. The $lookup step seems to be working, but the $match on joined_docs.some_field is not using the index on collectionB.

From what I understand, the $lookup stage does a left outer join and brings the documents into memory, so the $match afterward happens on in-memory objects, not allowing the index on some_field to be used.

Questions

  1. Is it expected that the index on some_field from collectionB is not used after $lookup?
  2. Is there a better way to perform this kind of query efficiently?
  3. Should I consider rewriting the logic as two separate queries? ( considering to get the foreign_key from collection B to then perform matching stage on collection A without the need for $lookup stage on all the collection

Any insights, best practices, or guidance would be really appreciated!

Thanks in advance!

Did you try to remove the { $unwind: "$joined_docs" } stage?
$match works also on array, an $unwind is not always needed.

You can also try to put the match inside the lookup, i.e.

{
   $lookup:
     {
       from: "collectionB",
       localField: "foreign_key",
       foreignField: "_id",       
       pipeline: [ { $match: { "some_field": "some_value" } ],
       as: "joined_docs"
     }
}
1 Like

Yes it is. After the $lookup it is not collectionB anymore. It is not even collectionA anymore.

Since you are doing no $match, nor $sort on collectionA which you do on collectionB starting the aggregation with collectionB with a $match stage and the $lookup from:collectionA should be better. You must then have an index on foreign_key.