Does $lookup follow by an inner $match get optimized?

Hello,
I know that

source_collection.aggregate([{
  '$lookup': {
    localField: '_id',
    from: foreign_collection,
    foreignField: 'foreign_id',
    as: 'joined',
  },
}, {
  '$unwind': {
    path: '$joined',
  },
}])

Will utilize the following index if it is available:

foreign_collection.index({
  foreign_id: 1,
})

However, I don’t see it mentioned anywhere if

source_collection.aggregate([{
  '$lookup': {
    localField: '_id',
    from: foreign_collection,
    foreignField: 'foreign_id',
    pipeline: [{
      '$match': {
        date => { '$gt': cut_off_date },
      },
    }],
    as: 'joined',
  },
}, {
  '$unwind': {
    path: '$joined',
  },
}])

will use

foreign_collection.index({
  foreign_id: 1,
  date: 1,
})

or not.

The use case is pretty common if you ask me; it is equivalent to joining with some custom filter conditions in SQL.
explain won’t help here as it doesn’t mention anything related to indexes.

How do I make sure that $lookup with an inner $match utilizes the index?
And if it doesn’t, can I request the feature now? How?

P.S. The code in this post is all pseudocode.

Hi @3Ji,

I believe indexes would be used in the scenario you described. I have the following test data:

test> db.source_collection.find({},{_id:0})
[ { a: 1 } ]

test> db.foreign_collection.find({},{_id:0})
[ { a: 1, b: 5 } ]

With the following indexes:

test> db.source_collection.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { a: 1 }, name: 'a_1' }
]

test> db.foreign_collection.getIndexes()
[
  { v: 2, key: { _id: 1 }, name: '_id_' },
  { v: 2, key: { a: 1, b: 1 }, name: 'a_1_b_1' }
]

Some of the db.collection.explain("executionStats").aggregate() output on the following aggregation pipeline:

test> db.source_collection.aggregate(
[
  { '$match': { a: 1 } },
  {
    '$lookup': {
      from: 'foreign_collection',
      localField: 'a',
      foreignField: 'a',
      as: 'joined',
      pipeline: [
        { '$match': { b: { '$gt': 3 } } }
      ]
    }
  }
])

Outputs the following:

/// For the $match portion:
winningPlan: {
            queryPlan: {
              stage: 'FETCH',
              planNodeId: 2,
              inputStage: {
                stage: 'IXSCAN',
                planNodeId: 1,
                keyPattern: { a: 1 } /// <--- source_collection index used

/// For the $lookup portion from the explain("executionStats") output:
{
      '$lookup': {
        from: 'foreign_collection',
        as: 'joined',
        localField: 'a',
        foreignField: 'a',
        let: {},
        pipeline: [
          { '$match': { b: { '$gt': 3 } } }
        ]
      },
      totalDocsExamined: Long("1"),
      totalKeysExamined: Long("1"), /// <--- Total keys examined
      collectionScans: Long("0"), /// <--- No collection scans
      indexesUsed: [ 'a_1_b_1' ],
      nReturned: Long("1"),
      executionTimeMillisEstimate: Long("2")
    }

You can check the explain("executionStats") output from running the pipeline and look for the $lookup portion for totalKeysExamined / collectionScans values.

Note: the above tests were performed on my test environment running MongoDB version 7.0.2

Does this help with your query?

Regards,
Jason

1 Like