I’m working with a MongoDB aggregation pipeline that involves joining two large collections—mainCollection
and relatedCollection
—using $lookup
. The goal is to filter documents based on various fields, including some within the joined data, and then paginate the results.rrawat.com+1medium.com+1
However, the query is experiencing performance issues, taking several seconds to return results. The explain
output indicates a collection scan (COLLSCAN
) and a blocking sort operation. I’m seeking advice on how to optimize this pipeline to improve performance.
Below is Aggregate:
db.getCollection(“mainCollection”).aggregate([
{ $sort: { createdAt: -1 } },
{
$addFields: {
relatedIds: [“$refId1”, “$refId2”, “$refId3”]
}
},
{
$lookup: {
from: “relatedCollection”,
localField: “relatedIds”,
foreignField: “_id”,
as: “relatedDetails”
}
},
{
$match: {
$or: [
{ fieldA: “searchValue” },
{ fieldB: “searchValue” },
{ “relatedDetails.fieldC”: “searchValue” },
{ “relatedDetails.fieldD”: { $regex: “searchValue”, $options: “i” } }
]
}
},
{ $skip: 0 },
{ $limit: 10 },
{
$addFields: {
refId1Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId1”] }
}
}
},
refId2Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId2”] }
}
}
},
refId3Details: {
$first: {
$filter: {
input: “$relatedDetails”,
as: “rel”,
cond: { $eq: [“$$rel._id”, “$refId3”] }
}
}
}
}
},
{ $addFields: { id: “$_id” } }
],
{ allowDiskUse: true });
and
explain
Output Summary
- Execution Time: Approximately 4.3 seconds.
- Documents Examined: ~95,000.
- Winning Plan: Collection scan (
COLLSCAN
) followed by a blocking sort (SORT
). - Lookup Stage: Examined ~59,000 documents in
relatedCollection
. - Match Stage: Applied after
$lookup
, filtering on both main and joined fields.