Lecture: Queries in a Sharded Cluster which is a part of Chapter 3 of course M103, at 2:13 states “With skip(), mongos performs the skipping on the merged set of results, and doesn’t push anything down to the shard level.”
Furthermore, the notes below this lecture explains a specific case where skip() is used with limit(). It states “When used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to ensure a sufficient number of documents are returned to the mongos to apply the final limit() and skip() successfully.”
Thus as per course M103, Skipping is not performed on the individual shards, but is performed only once, on the final merged results.
However, course M201 states otherwise.
Lecture: Performance Considerations in Distributed Systems Part 2, which is a part of Chapter 5 of course M201, at 4:40 states “Mongos will select the designated shards. A local Skip and Limit will be performed, and once those results are done, a final merge of that result set is performed on the primary shard, and then the result is sent back to our client through the mongos.”
Following process flow taken from this Lecture.
Can MongoDB University TA team please explain this contradiction between the 2 lectures from courses M103 and M201?
As per my understanding, if skip() is performed at local shard levels, i.e. before deriving the final set of merged documents, then there are high chances that the local skips will end up skipping the wrong documents; thus, the final resultant set of documents after the merge will be incorrect. However, I will let you guys clarify / confirm this please… Thanks!