Hi,
I read this principles on aggregation on a sharded cluster.
When aggregation operations run on multiple shards, the results are routed to the
mongos
to be merged, except in the following cases:
- If the pipeline includes the
$out
or$lookup
stages, the merge runs on the primary shard.- If the pipeline includes a sorting or grouping stage, and the allowDiskUse setting is enabled, the merge runs on a randomly-selected shard.
Curious what makes the two bullet rules behind the scene?
Say for example, $out.
It feels more reasonable for the mongos to merge and write documents across shards because it’s the mongos that knows shard configuration.
Thx