Index usage in $lookup

I’m seeing two ‘strange’ behaviours when using $lookup.

  1. Index not being used when the local field is a dot path
  2. Index not being used unless there’s a $limit stage beforehand

Both cases use the same two collections:
survey (has 3,000 documents):

{ attributes.personId: 123 }

person (has 250,000 documents):

{ attributes.personId: 123, attributes.name: "Marvin" }

attributes.personId has a unique index on it.

1. Index not being used when local field is a dot path
When running the following pipeline on survey, the unique index isn’t used:

[
  {
    $lookup: {
      from: "person",
      localField: "attributes.personId",
      foreignField: "attributes.personId",
      as: "person"
    }
  }
]

But if I do this, it is (sort of, see point 2):

[
  {
    $project: {
      personId: "$attributes.personId"
    }
  },
  {
    $lookup: {
      from: "person",
      localField: "personId",
      foreignField: "attributes.personId",
      as: "person"
    }
  }
]

2. Index not being used unless there’s a $limit stage beforehand
Even with the $project the index isn’t used unless I put a $limit stage beforehand:

[
  {
    $project: {
      personId: "$attributes.personId"
    }
  },
  { $limit: 50000 },
  {
    $lookup: {
      from: "person",
      localField: "personId",
      foreignField: "attributes.personId",
      as: "person"
    }
  }
]

Now, the unique index is used and the query flies along.

The index isn’t used with localField: "attributes.personId" even with the $limit stage.

Is this all expected behaviour?!

(Note: I want to use this query periodically to get names into the surveys so I’m not using lookups all the time when querying)

I know it has been a while since you posted but if you could share sample documents (or better created a mongo playground) it would help us help you by reducing the overhead needed to investigate.

Hi,

In a quick test, I see the same execution plan:

db.createCollection("survey");
db.person.createIndex({ "attributes.personId": 1 }, { unique: true });
// Test query where localField is a dot path
db.survey.aggregate([
    {
        $lookup: {
            from: "person",
            localField: "attributes.personId",
            foreignField: "attributes.personId",
            as: "person"
        }
    }
]).explain().queryPlanner.winningPlan.queryPlan

{
  stage: 'EQ_LOOKUP',
  planNodeId: 2,
  foreignCollection: 'test.person',
  localField: 'attributes.personId',
  foreignField: 'attributes.personId',
  asField: 'person',
  strategy: 'IndexedLoopJoin',
  indexName: 'attributes.personId_1',
  indexKeyPattern: { 'attributes.personId': 1 },
  scanDirection: 'forward',
  inputStage: {
    stage: 'COLLSCAN',
    planNodeId: 1,
    filter: {},
    direction: 'forward'
  }
}

// Test query where localField is projection
db.survey.aggregate([
    {
        $project: { personId: "$attributes.personId" }
    },
    {
        $lookup: {
            from: "person",
            localField: "personId",
            foreignField: "attributes.personId",
            as: "person"
        }
    }
]).explain().queryPlanner.winningPlan.queryPlan

{
  stage: 'EQ_LOOKUP',
  planNodeId: 3,
  foreignCollection: 'test.person',
  localField: 'personId',
  foreignField: 'attributes.personId',
  asField: 'person',
  strategy: 'IndexedLoopJoin',
  indexName: 'attributes.personId_1',
  indexKeyPattern: { 'attributes.personId': 1 },
  scanDirection: 'forward',
  inputStage: {
    stage: 'PROJECTION_DEFAULT',
    planNodeId: 2,
    transformBy: { _id: true, personId: '$attributes.personId' },
    inputStage: {
      stage: 'COLLSCAN',
      planNodeId: 1,
      filter: {},
      direction: 'forward'
    }
  }
}

Do you see something different?

The COLLSCAN we see at

inputStage: {
    stage: 'COLLSCAN',
    planNodeId: 1,
    filter: {},
    direction: 'forward'
  }

and

are related to the survey collection. This is normal since no $match nor $sort is done on survey.

The values

in both plan make me feel like the index attributes.personId:1 of the collection person is used in both cases.