Hi @kembhootha_k - my name is Chris and I am one of Maxime’s coworkers here at MongoDB. Thanks again for your question.
Broadly speaking, the takeaway from this comment is going to be that:
-
The SINGLE_SHARD
stage is unlikely to be meaningfully contributing to the duration noted in the explain
output.
-
Individual query latency is necessarily going to be higher on a single shard sharded cluster compared to a replica set by itself.
I would also be curious about what your specific goals are. At a worst case (presumably cold cache) total time of 46
seconds, this implies that each document is being processed in 0.023
milliseconds or a processing rate of nearly 43,500
documents per second. Is there a more defined target that you are trying to hit, or are you just exploring what is possible with the current configuration?
Would you be able to provide the full explain
outputs for us to examine? It is difficult to provide specific answers or guidance about what may be happening in your environment given only a few duration metrics. When examined as a whole, explain output really helps tell a story (or acts as a map) about what is going on. Without the complete picture we may be missing important pieces.
Even in the absence of the full output, we can still say a few things that are probably useful. I would expect the execution time reported by subsequent stages in the explain output to be cumulative and inclusive of their children stages. This implies a few interesting items:
-
There may be a typo or mixup in the numbers mentioned in the original post. I don’t think it should be possible for the parent SINGLE_SHARD
stage to report a smaller duration (15 seconds) than its child FETCH
stage (44 seconds). Is it possible the times for the SINGLE_SHARD
stage were transposed between the two runs, as the 46 seconds and 13 seconds from the opposite lines seem to match pretty closely?
-
The total time for the explain
operation should basically be the largest number (e.g. 46 seconds) as opposed to the sum of each duration reported (e.g. 60.5 = 1.5 + 13 + 46).
-
The SINGLE_SHARD
stage should not be responsible for doing much work. Given the assumptions above are correct (including the final number being swapped), the maximum time that could be attributable to this stage would be 3 seconds. Even that number could be inflated for other reasons. There is probably not much (or any) optimization which could really be done here.
As a point of comparison, what is the total duration for the same explain
operation when executed directly against the PRIMARY
member of the underlying replica set for the shard? I would expect that the majority of the time (when using explain
) will be dominated by the work being performed by the underlying shard, so the numbers will likely be similar.