Lookup sub-pipeline is not using index with the $in operator

Yuk_Po_Tse · June 27, 2023, 7:45am

Hello,

I got a questions on index using lookup with $in operator.
the data structure look like below:

Collection test_1

{
  "_id": {
    "$oid": "649a8ce117954132f6fbbb07"
  },
  "type": "123",
  "desc": "...",
  "from": "A"
},
{
  "_id": {
    "$oid": "649a8d4c17954132f6fbbb0b"
  },
  "type": "456",
  "desc": "...",
  "from": "B"
}...

Collection test_2

{
  "_id": {
    "$oid": "649a8e0b17954132f6fbbb0e"
  },
  "name": "xxx",
  "ary": [
    {
      "type": "123",
      "from": "B"
    },
    {
      "type": "456",
      "from": "B"
    }
  ]
}...

Aggregation plpeline

[{
    $lookup: {
        from: 'test_1',
        'let': {
            type: '$ary.type',
            from: '$ary.from'
        },
        pipeline: [{
            $match: {
                $expr: {
                    $and: [{
                            $in: [
                                '$type',
                                '$$type'
                            ]
                        },
                        {
                            $in: [
                                '$from',
                                '$$from'
                            ]
                        }
                    ]
                }
            }
        }],
        as: 'result'
    }
}]

Index created on test_1

{ "type": 1, "from": 1 },
{ "type": 1 },
{ "from": 1 }

For other reason it is not possible to embed the test_1 into test_2.

When performing the lookup, a collection scan is done. Are there anyway to improve the performance on this lookup?

Thanks.

steevej · June 27, 2023, 12:53pm

I would try using a mix of localField/foreignField with $match. Something like:

$lookup: {
        from: 'test_1',
        localField: 'ary.type',  /* $lookup is smart enough to have an array for localField */
        foreignField: 'type' ,
        'let': {
            from: '$ary.from'
        },
        pipeline: [{
            $match: {
                $expr: { $in: [ '$from',  '$$from' ] }
            }
        }],
        as: 'result'
    }

Yuk_Po_Tse · June 27, 2023, 1:16pm

Thanks for reply. This could use the index for “type”.
So can conclude that compound indexes in lookup sub-pipeline $expr would not be supported in current version of MongoDB?

steevej · June 27, 2023, 2:08pm

I cannot really comment on what is supported or not by the current version of MongoDB.

If your explain plan indicated a collection scan then it meant that your query could not leverage any of the index you have in the version of MongoDB you are using.

An other thing you may try to further optimize the use-case. You could swap and try to use from as the localField/foreignField and $match using type. Depending of the granularity of both fields one might be more efficient.

Another idea is to modify test_1 to an object field that contains both type and from and have the index on that top object. Like

{
  "_id": {
    "$oid": "649a8ce117954132f6fbbb07"
  },
  source:{"type": "123",  "from": "A"},
  "desc": "..."
},
{
  "_id": {
    "$oid": "649a8d4c17954132f6fbbb0b"
  },
  source:{"type": "456",  "from": "B"},
  "desc": "..."
}

Index on test_1 would be { “source” : 1 } and then the lookup would be:

$lookup: {
        from: 'test_1',
        localField: 'ary',
        foreignField: 'source' ,
        as: 'result'
    }

Yuk_Po_Tse · June 27, 2023, 2:22pm

Understood.
Thank you very much!

system · July 2, 2023, 2:23pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.