Issue in Aggregate response time

I’m working with a MongoDB aggregation pipeline that involves joining two large collections—mainCollection and relatedCollection—using $lookup. The goal is to filter documents based on various fields, including some within the joined data, and then paginate the results.rrawat.com+1medium.com+1

However, the query is experiencing performance issues, taking several seconds to return results. The explain output indicates a collection scan (COLLSCAN) and a blocking sort operation. I’m seeking advice on how to optimize this pipeline to improve performance.

Below is Aggregate:
db.getCollection(“mainCollection”).aggregate([
{ $sort: { createdAt: -1 } },
{
$addFields: {
relatedIds: [“$refId1”, “$refId2”, “$refId3”]
}
},
{
$lookup: {
from: “relatedCollection”,
localField: “relatedIds”,
foreignField: “_id”,
as: “relatedDetails”
}
},
{
$match: {
$or: [
{ fieldA: “searchValue” },
{ fieldB: “searchValue” },
{ “relatedDetails.fieldC”: “searchValue” },
{ “relatedDetails.fieldD”: { $regex: “searchValue”, $options: “i” } }
]
}
},
{ $skip: 0 },
{ $limit: 10 },
{
$addFields: {
refId1Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId1”] }
}
}
},
refId2Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId2”] }
}
}
},
refId3Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId3”] }
}
}
}
}
},
{ $addFields: { id: “$_id” } }
],
{ allowDiskUse: true });
and

explain Output Summary

  • Execution Time: Approximately 4.3 seconds.
  • Documents Examined: ~95,000.
  • Winning Plan: Collection scan (COLLSCAN) followed by a blocking sort (SORT).
  • Lookup Stage: Examined ~59,000 documents in relatedCollection.
  • Match Stage: Applied after $lookup, filtering on both main and joined fields.

Please read Formatting code and log snippets in posts and reformat your aggregation so that it is easier for us to understand it.

Even better, create a playgound with it that includes sample documents.

3 Likes

Try to put the { $sort: { createdAt: -1 } } stage after the $match, see Aggregation Pipeline Optimization - Database Manual - MongoDB Docs

Playground Link for above aggrgetae

1 Like
db.getCollection("mainCollection").aggregate([
  { $sort: { createdAt: -1 } },
  {
    $addFields: {
      relatedIds: ["$refId1", "$refId2", "$refId3"]
    }
  },
  {
    $lookup: {
      from: "relatedCollection",
      localField: "relatedIds",
      foreignField: "_id",
      as: "relatedDetails"
    }
  },
  {
    $match: {
      $or: [
        { fieldA: "searchValue" },
        { fieldB: "searchValue" },
        { "relatedDetails.fieldC": "searchValue" },
        { "relatedDetails.fieldD": { $regex: "searchValue", $options: "i" } }
      ]
    }
  },
  { $skip: 0 },
  { $limit: 10 },
  {
    $addFields: {
      refId1Details: {
        $first: {
          $filter: {
            input: "$relatedDetails",
            as: "rel",
            cond: { $eq: ["$$rel._id", "$refId1"] }
          }
        }
      },
      refId2Details: {
        $first: {
          $filter: {
            input: "$relatedDetails",
            as: "rel",
            cond: { $eq: ["$$rel._id", "$refId2"] }
          }
        }
      },
      refId3Details: {
        $first: {
          $filter: {
            input: "$relatedDetails",
            as: "rel",
            cond: { $eq: ["$$rel._id", "$refId3"] }
          }
        }
      }
    }
  },
  { $addFields: { id: "$_id" } }
],
{ allowDiskUse: true });

1 Like

I did try adding the $sort: { createdAt: -1 } stage after the $lookup and $match stages, but it noticeably increased the aggregate execution time by around 4–5 seconds.

I was thinking of creating a temporary collection to store the results after the $lookup, then indexing it to improve aggregate performance — but I’m not sure if storing duplicate data like that is a good approach.

Do you need all the fields which are returned by the query?

And do you really need { $regex: “searchValue”, $options: “i” } - all with $or?

Usually a regular expression search slows down your query.

Mainly i need the 6-7 fields from lookup and 4-5 fields from MainCollection, i tried with adding pipeline in lookup to get specific fields but it has increased my Overall Aggegrate run time.

yes i required all $or check because then it will easy for end User to fetch data as ref to multiple fields.

Thanks for the playground.

I totally agree with

In your case, since it looks like it is an email address field and email addresses are case-insensitive you could spare that extra computing by permanently storing emails in lowercase.

You might need to have searchValue as entered by the user for the other comparison but search with searchValueInLowercase for …fieldD.

Before going to the extent of

I think we can do better but we might need a little more brainstorming.

Most likely because the $addFields and $lookup modifies the documents and $sort cannot determine that whatever index you have with createdAt prefix can be use. Please share the indexes you have on mainCollection.

The $addFields after the $limit seems purely cosmetic, I would try to remove them while your invesitigate. And I am pretty sure the $first is not really needed because foreignField is _id so you should only get 1 document per refId.

That’s all I can think of for now.

$lockup always returns an array. You are right, it is only 1 document each. But $first is the shortest way to convert such array into a scalar value.

We still miss a few details. What are your indexes?

As mentioned by steevej, field fieldD seems to be an email address. In this case it would make sense to convert them to lower case and use equal condition. { $regex: "searchValue", $options: "i" } might be wrong anyway. Mail sales@mongodb.com is different to pre-sales@mongodb.com but they both match to { $regex: "sales@mongodb.com", $options: "i" }

You could try to limit the number of documents in $lookup, for example like this:

{
   $lookup: {
      from: "relatedCollection",
      localField: "relatedIds",
      foreignField: "_id",
      pipeline: [
         {
            $match: {
               $or: [
                  { fieldC: "FieldCValue12" },
                  { fieldD: "FieldCValue12" }
               ]
            }
         }
      ],
      as: "relatedDetails"
   }
}

Or maybe try to get related details separately and then use $unionWith. In principle, it would be

db.mainCollection.aggregate([
   {
      $match: {
         $or: [
            { fieldA: "FieldCValue12" },
            { fieldB: "FieldCValue12" }
         ]
      }
   },
   {
      $lookup: {
         from: "relatedCollection",
         localField: "relatedIds",
         foreignField: "_id",
         as: "relatedDetails"
      }
   },
   {
      $unionWith: {
         coll: "relatedCollection",
         pipeline: [
            {
               $match: {
                  $or: [
                     { fieldC: "FieldCValue12" },
                     { fieldD: "FieldCValue12" }
                  ]
               }
            },
            {
               $lookup: {
                  from: "mainCollection",
                  localField: "_id",
                  foreignField: "relatedIds",
                  pipeline: [
                     {
                        $match: {
                           $or: [
                              { fieldA: "FieldCValue12" },
                              { fieldB: "FieldCValue12" }
                           ]
                        }
                     }
                  ],
                  as: "relatedDetails"
               }
            },
            ...
         ]
      }
   },
   ...

Nice observation, I completely missed that.

Note that if you really want to match sales@… only rather that pre-sales@… you may anchor the regex at the front with ^ and $ at the end. Anchoring at the front might improve performance.

@Durgesh_Gupta1, we would appreciate the solution you found even if it was not with help of the replies you got.

Thanks in advance