I am pretty sure that a sub-pipeline with a $limit stops the $lookup as soon as the count is reached so the performance should be on par to any single (invented name) $findOne stage.
Indeed the option $lookup/$limit: 1 is missleading, because I want a limit 1 just for the lookup, and not for the whole aggregation pipeline. I don’t know why well, but when I use sub-pipelines in the extended $lookup, the performance is reduced dramatically.
I have faced this situation several times and I have always to do some kind of hack to get the data…