Sort Merge and Shard Merge using find().sort()

I was reviewing past courses/lectures and forgot to mention a discrepancy that I came across between M201 and M103 re sort merge using find().sort(). I have also reported the issue!

NB: Sort merge and merge sort is used interchangeably.

M201 MongoDB Performance | Chapter 5: Performance on Clusters | Lecture: Performance Considerations in Distributed Systems Part 2:
At 4:15 the Curriculum Engineer states that after each shard performs a local sort, the sort merge will happen on the Primary Shard before sending back the results to the client.

M103 Basic Cluster Administration | Chapter 3: Sharding | Lecture: Queries in a Sharded Cluster:
At 1:46 and 2:34 the Curriculum Engineer states that the sort merge/merge happens on Mongos.

This open JIRA from 2018 also states that the sort merge happens on Mongos and it seems to imply that the documentation hasn’t been updated.

@Sonali_Mamgain/other Curriculum Support Engineers, can we please get absolute clarity on this matter. And as there’s no timestamp on the documentation, it’s uncertain whether it has been updated.
In addition, can you please clarify whether the Mongos or the Primary Shard does the Shard Merge (i.e. when we do a find() without a sort or other appended cursor methods).

2 Likes

Hi @007_jb,

It is an interesting question. Great Eye!!

I will definitely take a look at this and will provide you with correct explaination.

Thanks,
Sonali

Hi @007_jb,

The individual shards sorts the result before returning them to mongos, but mongos always performs the merge sort ( unless all results come from a single shard, then there is nothing to merge with).

Thanks for this question, I will raise this point of confusion in the lectures as well as documentation to the concerned teams.

Please let me know, if you have any questions.

Thanks,
Sonali

1 Like

Very prompt response @Sonali_Mamgain, thanks!

What about this one? I believe mongos does the shard merge but just need confirmation.

Any updates on this @Sonali_Mamgain?

Hi @007_jb,

If there is no cursor methods specified with the find(), then the mongos will directly return the extracted documents from each shard to the client application.

Please let me know, if you have any questions.

Thanks,
Sonali

Thanks again @Sonali_Mamgain. I’m aware that mongos does the routing of requests/results, my question is about merging the results (i.e. shard_merge) before it gets sent to the client.

In a scenario where two shards process a request without cursor methods, is it mongos that does the shard_merge or is there no merging of the results?

The documentation states that mongos performs the shard_merge however, as initially stated, I don’t know if this document is up-to-date or not. Here’s an excerpt:

Cheers

Hi @007_jb,

You are right, mongos will merge the data from each shard.

The documentation is up-to-date.

Thanks,
Sonali

1 Like

Fabulous! Thanks @Sonali_Mamgain!

can you give little more information about shard sort and merge sort ?. you mean to sorting happens in each shard even if the query does not the explicit sort operator?

by the way I also was able to find the different stating of merge sort between two different courses and found this discussion forum question.

is there any documentation url which provide information about this data merging in sharded cluster ?

It’s the same documentation that I shared with you in your thread and the same documentation referenced in my post #1. Focus on the documentation on “Routing and Results Process”:

  • find() - mongos performs the merge
  • find().sort() on single shard
    • primary shard performs the sort and mongos retrieves the results… there’s nothing to merge because it’s from a single shard
  • find().sort() on multiple shards
    • each primary shard performs the initial sort and mongos performs the final merging and sorting of the results (i.e. merge sort)
1 Like

thanks …