Final exam - question 5

I have a tiny doubt about a question of the final exam^^
I do not know how to post it, so i’ll try to be as generic as possible (and feel free to remove the post as required if it violate any of the forum rules)^^

Because the $match stage did not come prior […] all source documents will pass through them, a wasteful computation

  1. Is the above statement correct?
    Quoting from mongo docs ( Mongo docs - pipeline optimization ) :

If an aggregation pipeline contains multiple projection and/or $match stages, MongoDB performs this optimization for each $match stage, moving each $match filter before all projection stages that the filter does not depend on.

  1. Now, the following statement is related to the correct answer:
    however the number should be small enough

    That’s was exactly my doubt: how much is enough? The situation that verifies in the correct answer is described as the devil on earth in other parts of the course (and other courses), but if it applies to a small enough dataset is fine.

What i want to point out is: aren’t those two described situation a bit too much ‘close’ ? (by close i mean heavily dependant on the dataset size, which is not provided as handout).

Finally, i want to thank you for all the work you’ve done and keep doing! :+1:


Well, I don’t know that I’d describe an in-memory sort as “…the devil on earth…”:smile: but it’s certainly something to be avoided if possible. Certainly having an in-memory sort for most situations requires some careful thought, particularly if the collection is likely to grow or change. In this case, however (as the explanation points out) there are a limited number of cities that meet the criteria and the number of documents isn’t going to grow. Notice that the choice doesn’t depend on the size of the collection, but only on the projected size of the result, which we can calculate fairly easily. So having an in-memory sort here, while not optimal, provides the best option for improving the pipeline performance.

1 Like

Ahah yeah maybe i overstated it a bit :stuck_out_tongue_closed_eyes:

But the point is that one of the match filter would be pushed at the start of the pipeline by the pipeline optimization right? That would lead to an index sort and right after a match that filter out everything outside USA^^ Is that so much different (from a performance point of view) from an early match and an in-memory sort?

the choice doesn’t depend on the size of the collection, but only on the projected size of the result,

Yes, i meant the size of the bunch of documents that will be considered as input of the stage ( can’t remember a specific term :confused: )

the number of documents isn’t going to grow

Uhm, this is something it didn’t considered^^

Thank you for the clarifications!