Aggregation limitation for large data and low latency

I want to use mongoDB for some real time aggregation. I can not do pre agretaion and store the result as my use cases changes.

My aggregation pipeline will generally have below stages

$match
$group
$sort

  1. when I was going through limitation on aggregation, found that MAX each stage output should not be > 100MB although we allowDiskUse:true but it may affect permanence.
    Is there other way where we can have kind of blocks with map/reduce support during this kind of aggregation in order to guarantee the scaling required if data is huge ?

  2. How to evaluated the performance, today I have 10k document in user collection and it is shareded and all the fields I use in projection are indexed . Writing each aggregation query and checking result my give performance for current query with respect to current volume. I need to evaluate and give a worst case time calculation, with next 2 years it may grow to 10 to 20 million documents.

  3. Is join suggested in aggregation ?
    Kindly help with some details or point to some docs where I can get more knowledge on these