Aggregate slow on $match $expr $in

I’m having performance issue on a big aggregation query with multiple $lookup to retrieve data between different collections.
Trying to investigate I’ve found that one of the slower stages is using a $match - $expr - $in on a indexed field, but trying to isolate and analyze that stage with explain it result a COLLSCAN and not an IXSCAN.

The stage is something like:

  {
        '$lookup': {
            from: 'othercollection',
            let: {
                inputarray: '$inputarray'
            },
            pipeline: [{
                    '$match': {
                        '$expr': {
                            '$in': ['$_id', '$$inputarray']
                        }
                    }
                },
          .....

I’ve found this old issue, trying to found a solution over here https://jira.mongodb.org/browse/SERVER-32549: is the situation still the same on 4.4 ?

If an index cannot be used, do you have any suggestion of how to transform the query?

1 Like

Is there a reason for you to use expressive lookup here with $expr? Since you are doing equality you can just use regular $lookup with localField set to "inputarray".

2 Likes

Actually it was because in expressive lookup I had other conditions in pipeline after match

1 Like

Ok, but when trying to debug performance it helps to provide the full pipeline you need to run.

Note that in 5.0 you can use simple localField/foreignField $lookup and specify additional pipeline stages for the $lookup. But even if you’re in 4.x depending on the full pipeline it might be significantly faster to rewrite it so that it can use an index for $lookup. If you provide more details, we can probably help.

Asya

3 Likes

You are right, my fault was that I’ve focused on that part because seems to be the one affected and I’ve tried to provide a simplified sample of my situation, also because the full aggregation is 500+ lines of code.

Thanks to notify me the new additional pipeline of 5.0, I’ve tried it on a test environment and it seems to solve with very simple rewrite: I’ve to verify the impact of the Mongo upgrade on production env.

In the meantime I can say that I have four $lookup stage in my aggregation, two of them are of the mentioned type and for one should be simple to find a rewrite because after match there is only a $project, the other one has also three other linked lookup inside the pipeline: following this entire stage as it is, I don’t know if it’s clear enough, let me know your comments/doubts

{
  '$lookup': {
    from: 'activities',
    'let': {
      tagUids: {
        $concatArrays: [
          '$TagUidsFromDb.uid',
          [
            tagUid,
          ],
        ],
      },
    },
    pipeline: [
      {
        $match: {
          $expr: {
            $and: [
              {
                $in: [
                  '$tagUid',
                  '$$tagUids',
                ],
              },
              {
                $or: [
                  {
                    $eq: [
                      '$belongsTo',
                      'tags',
                    ],
                  },
                  {
                    $eq: [
                      {
                        $type: '$belongsTo',
                      },
                      'missing',
                    ],
                  },
                ],
              },
              {
                $ne: [
                  '$userId', 'public',
                ],
              },
            ],
          },
        },
      },
      {
        $lookup: {
          from: 'actions',
          localField: 'actionId',
          foreignField: '_id',
          as: 'action',
        },
      },
      {
        $unwind: '$action',
      },
      {
        $lookup: {
          from: 'usersproperties',
          localField: 'userId',
          foreignField: 'auth0Id',
          as: 'user',
        },
      },
      {
        $unwind: {
          path: '$user',
          preserveNullAndEmptyArrays: true,
        },
      },
      {
        $lookup: {
          from: 'tags',
          localField: 'tagUid',
          foreignField: 'uid',
          as: 'tag',
        },
      },
      {
        $unwind: {
          path: '$tag',
          preserveNullAndEmptyArrays: true,
        },
      },
      {
        $project: {
          'action.name': 1,
          activityId: {
            $toString: '$_id',
          },
          _id: 0,
          validationPassed: 1,
          tagUid: 1,
          'user.email': 1,
          date: '$creationDate',
          _type_: 'tagActivity',
          'priority': '0',
          destinationProperties: 1,
        },
      },
    ],
    as: 'TagsActivityResults',
  },
},
1 Like

If you have that many $lookup stages, that suggests that maybe your schema isn’t a good match to MongoDB strengths…

1 Like

Generally speaking and looking only at this pipeline I agree of course with you, but watching the entire application area I swear it’s not so bad as it seems.