Stages in explainPlan

Prof_Monika_Shah · December 7, 2022, 6:51pm

Where can I find description of all input_stages

MaBeuLux88_xxx · December 8, 2022, 2:16pm

You can find some documentation in here: Explain Results. But they are not all fully documented as they are closely related to the entire query engine which can differ a bit from one version to another and frankly this can be more confusing than helping.

What is the question exactly? Maybe I can try to help.

Cheers,
Maxime.

Prof_Monika_Shah · December 9, 2022, 5:38am

Hi @MaBeuLux88_xxx ,
Thank you for reply.
I know this url for basic information about stages. But, it does not show all stages or input stages. I have some queries .
for example for db.emp.explain(‘executionStats’).find({‘salary’:{’$gt’:197812}}),
it shows
Stage:Shard_merge Time: 848 [shard_rs1 Time:576,shard_rs2 Time:703,shard_rs3 Time:847]
Stage wise time at every shard
shard_rs1 Time : 576(shard_filter Time: 174 (Fetch :113 (IXSCAN:43))) ]
shard_rs2 Time : 703(shard_filter Time: 309 (Fetch :213 (IXSCAN:73))) ]
shard_rs3 Time : 847(shard_filter Time: 331 (Fetch :254 (IXSCAN:75))) ]
My queries are
i) Is Shard_Merge done at every shard ? (Because, It shows array of executionStats for every shard)
ii) What is difference between executionTimeMillis and executionTimeMillisEstatimte?
(Because, it shows executionTimeMillis for every shard in Shard_Merge array. On other side, for every input stages, it shows executionTimeMillisEstimate)
iii) Is executionTimeMillisEstimate of every stage represent executionTimeMillisEstimate + its own processing time?
iv) Why executionTime of every shard is much more than other stages?
What is need of Sharding_Filter after IXSCAN and FETCH stages for each shard?

Prof_Monika_Shah · December 12, 2022, 4:41am

Can you give a good example to understand use of SHARD_FILTER stage?

Prof_Monika_Shah · December 12, 2022, 4:53am

Why Shard_Merge is own at all shards in explain plan?

MaBeuLux88_xxx · December 15, 2022, 10:12am

I honestly don’t have an answer to all these questions. It’s very specific and low level.

Can you please share the entire explain plan with explain(true) so I can have a better idea of what is happening?

Some stages only deal with orphan docs so they are relatively fast. Everything “estimate” is probably related to the query planner and the cached plans.

Maxime.

Christopher_Harris · January 6, 2023, 7:33pm

Hi @Prof_Monika_Shah,

As my colleague noted, having the full (verbose) explain output could provide us with a better understanding of the situation and potentially allow for a more useful reply. The other thing that would be particularly useful to know would be about the overall problem that you are facing and the goals/outcome that you are attempting to achieve. Without that we are only going to be able to provide snippets of information which may or may not be helpful in solving your underlying issue(s).

Broadly speaking, there’s nothing in the brief information that you’ve provided that looks particularly out of the ordinary or problematic. Shard filtering and merging are standard internal parts of executing queries that target more than one shard. The filtering stage is responsible for removing documents that have logically moved to another shard but haven’t been physically removed yet (referred to as orphans) and the merge stage is responsible for compiling the responses from individual shards into the single response that the client is expecting. Duration estimates are less accurate due to the manner in which they are measured which can result in rounding errors obscuring where some of the work is taking place. Stage times are cumulative, so any time spent by child stages are included in the duration of the parent stage. There is some parallelism that naturally happens with sharding, which means that the cumulative time there looks a little different. Relatedly it’s important to keep in mind that network latency associated with internal communication to process the operation will also be captured in various places for the reported execution times.

We cannot say anything meaningful about the overall duration of the operation (seemingly about 0.85 seconds) without more information. A key factor there would be the number of documents in the result set.

Hopefully this is helpful!

Best,
Chris