"Top-k" sort optimization

Hi Jose

With reference to the discussion around optimization (Chapter 6), the example shown has a $limit stage preceding a $sort stage.

Ideally the reverse will be the case i.e. you first $sort and then $limit the output results. Do you still get “Top-k” optimization in this case?

The documentation implies you do but the example seeems very specific

Regards
dayo

Hi,

Yes, you do. Indeed the query optimizer tries to coalesce the $limit into the $sort when possible:

https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/#sort-limit-coalescence

https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-limit-memory-optimization

José Carlos

2 Likes

Both $limit before $sort and $sort before $limit will limit the working set of $sort, but they have significantly different semantics. The example in this section should be fixed to use $sort before $limit.

The docs explain that when $sort precedes $limit then $sort can be optimized as a top-k sort. When $limit precedes sort it already reduces the result set, so $sort doesn’t have to do anything extra to use that limit. It benefits for free.

3 Likes

This topic reminds me of something similar. If you use cursor.limit().sort(), it is equivalent to cursor.sort().limit(), but cursors are different from pipelines.

With pipelines, $limit before $sort has a meaning different from $sort before $limit, but if the results are unordered before the limit, reordering the $limit and $sort and then coalescing $sort+$limit should be valid. But would MongoDB actually do that, since the $limit already reduces the working set size of $sort?