Help with aggregation optimization

Hi
I’m having some trouble with a specific aggregation that have a poor performance and would really appreciate help with optimizing it.

my db is consist with 4 collections, for example, A,B,C,D,E They are connected as follow:

A: 1-n :B
A: 1-n :C
A: 1-1 :D
A: 1-1 :E

In my application i need to return an array of documents from collection A with all the fields from all the other collections, and will need to filer + sort by every field within the 5 collections (according to a user operation)
My aggregation was initially build as follows:

[
    {
        "$match": {
            //Fields from collection A
        }
    },
    {
        "$lookup": {
            "from": "B",
            "localField": "BId",
            "foreignField": "_id",
            "as": "B"
        }
    },
    {
        "$lookup": {
            "from": "C",
            "localField": "CId",
            "foreignField": "_id",
            "as": "C"
        }
    },
    {
        "$lookup": {
            "from": "D",
            "localField": "_id",
            "foreignField": "leadId",
            "as": "D"
        }
    },
    {
        "$lookup": {
            "from": "E",
            "localField": "_id",
            "foreignField": "leadId",
            "as": "E"
        }
    },
    {
        "$unwind": {
            "path": "$B"
        }
    },
    {
        "$unwind": {
            "path": "$C"
        }
    },
    {
        "$unwind": {
            "path": "$D",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$unwind": {
            "path": "$E",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$match": {
            //Fields from collection B,C,D,E
        }
    },
    {
        "$sort": {
            //Fields from all the collections
        }
    },
    {
        "$skip": 0
    },
    {
        "$limit": 20
    }
]

after some reading i figure out i need to match + sort fields as soon as possible in order to eliminates document, so i rewrote my aggregation as follows:

[
    {
        "$lookup": {
            "from": "B",
            "localField": "BId",
            "foreignField": "_id",
            "as": "B"
        }
    },
    {
        "$unwind": {
            "path": "$B",
            "preserveNullAndEmptyArrays": false
        }
    },
    {
        "$match": {
            //Fields from collection B
        }
    },
    {
        "$sort": {
            //Fields from collection B
        }
    },
    {
        "$lookup": {
            "from": "C",
            "localField": "CId",
            "foreignField": "_id",
            "as": "C"
        }
    },
    {
        "$unwind": {
            "path": "$C",
            "preserveNullAndEmptyArrays": false
        }
    },
    {
        "$match": {
            //Fields from collection C
        }
    },
    {
        "$sort": {
            //Fields from collection C
        }  
    },
    {
        "$lookup": {
            "from": "D",
            "localField": "_id",
            "foreignField": "leadId",
            "as": "D"
        }
    },
    {
        "$unwind": {
            "path": "$D",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$match": {
            //Fields from collection D
        }
    },
    {
        "$sort": {
            //Fields from collection D
        }
    },
    {
        "$lookup": {
            "from": "Es",
            "localField": "_id",
            "foreignField": "leadId",
            "as": "E"
        }
    },
    {
        "$unwind": {
            "path": "$E",
            "preserveNullAndEmptyArrays": true
        }
    },
    {
        "$match": {
            //Fields from collection E
        }
    },
    {
        "$sort": {
            //Fields from collection E
        }
    },
    {
        "$skip": 20
    },
    {
        "$limit": 20
    }
]

It did improved the performance in some of the cases, but i’ve also upgraded mongo from 4.4 to 6.0.
Because of the slot-based execution mechanism in mongo 6.0 my previous aggregation actually having better performance in some cases, so i’m kind of back to square one.

My question is:

  • How can i optimize this aggregation to have the best performance on mongo 6.0 using the slot-base execution
  • How to perform a proper $lookup + $unwind in mongo 6.0 when everytime i’m using $unwind mongo would stop using slot-based execution mechanism

Thanks for the help

A few things

1 - all your $match stages should be move into the corresponding $lookup stage
2 - you should remove all $sort stages except the last one
3 - you do not need to $unwind before doing $lookup on an array

Requiring 4 $lookup for a paging use-case seems abusing.

Thank you for your help! i will look into you suggestions!
one question about the unwind, could you elaborate on that? i’m using $unwind after $lookup (not before) because in 1:1 or 1:N scenarios lookup returns an array with 1 item.

EDIT: I just tried moving the $match into the $lookups but having a $pipeline in a $lookup stage also break the slot-based execution mechanism

I misread the code. I thought you were using the result array of 1 $lookup to perform the next $lookup and this is not the case.

I will have to read about that since I am not familiar.