$limit in a scatter and gather query to a sharded collection

user.aggregate([
  {
    $match: {
      age: { $gt: 18 },
      city: { $in: ["chicago", "paris"] }
    }
  },
  {
    $sort: {
      last_logged_in: -1
    }
  },
  {
    $limit: 10000
  }
])

If the user collection is partitioned into 10 shards, will each shard return 10,000 documents, totalling to 100,000?

No, each shard will not return 10,000 documents, totaling 100,000. When you use $limit in an aggregation pipeline, it limits the total number of documents returned by the entire pipeline, not per shard. Therefore, the $limit: 10000 in your pipeline will retrieve a total of 10,000 documents from all shards combined, not 10,000 per shard.

The $match operator in your pipeline will filter documents based on the given criteria across all shards. This operation is performed on each individual shard and only the matching documents are returned from each shard.

The $sort operator will then sort these filtered documents based on the last_logged_in field.

Finally, the $limit operator will limit the total number of documents passed by the pipeline. This limit applies to the entire pipeline and not on a per-shard basis. Therefore, if you have specified $limit: 10000, it will retrieve a total of 10,000 documents from all shards combined, not 10,000 per shard.

So, the total number of documents returned by the pipeline will be a maximum of 10,000, not 100,000.

But mongos will potential receive 100_000 documents. 10_000 per shards because it does not know which 10_000 to return before it merge the sort result of each shard.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.