I agree that the $lookup seems to be designed to allow fetching data from other collections and this is not what I am doing. We have seen various ideas how to optimize this here but in the end of the day I found another way to solve this using $group and $first.
The complete pipeline also became 50% as long:
db.ModelEvaluation.aggregate([
{
$sort: {
modified: -1
}
},
{
$group: {
_id: "$customer_id",
values: {
$first: "$values"
},
dataset_transform: {
$first: "$dataset_transform"
}
}
},
{$unwind: "$values"},
{
$match: {
"values.target": "accoding"
}
},
{$unwind: "$dataset_transform"},
{
$match: {
"dataset_transform.tail": {$exists: true}
}
},
{
$project: {
customer: "$_id",
accuracy: "$values.value",
size: "$dataset_transform.tail",
}
},
{$out: "foo"},
])