About Sort and Index

Kim_Hakseon · August 26, 2020, 6:58am

What is the difference between Sort that doesn’t not use Index and Sort that uses Index?

Sort that doesn’t use Index occur in memory and Sort that uses Index occur in disk, so isn’t it good to have a Sort that doesn’t use index in terms of speed?

I would also like to know the comparison of results when used with other aggregates other than a only Sort.

Stennie_X · August 26, 2020, 8:31am

Hi @Kim_Hakseon,

A sort that is supported by an index only has to fetch matching results. In the context of an aggregation pipeline, this also allows a sort stage to be non-blocking: results can be passed to other stages for processing because documents are already returned in sorted order using the index.

If there isn’t an index to support a sort operation, an in-memory sort is required. In this case the server has to allocate a buffer in memory to temporarily store the search results and sort them in-place. This adds memory and computational overhead to a query and is a blocking stage for aggregation: further processing cannot happen until the results have been sorted.

There is a limit for in-memory sorts to mitigate potential resource usage for a large number of concurrent in-memory sorts (100MB in MongoDB 4.4 or 32MB in earlier server versions).

The aggregation framework has an allowDiskUse option which is also available for find queries in MongoDB 4.4+. This option allows data for a large in-memory sort to be written to temporary files (if necessary) rather than having the operation fail because of the in-memory sort limit.

However, the optimal approach is avoiding in-memory sorts where possible.

As a clarification for your description of “index on disk”, note that actively used indexes and documents are loaded into the working set in the WiredTiger internal cache which is 50% of (RAM-1GB) by default. Indexes are persisted to disk, but loaded into memory when used or updated.

Regards,
Stennie

Kim_Hakseon · August 27, 2020, 5:00am

Um… Is that a manual explanation?

What I was curious about was this.

I did this command

db.collecion.find( { field0 : { $gt : 10000 } } ).sort( { field1 : 1 } )

I’ve used this in three cases.

i) no index
ii) { field1 : 1}
iii) {field1 : 1, field0 : 1 }

The results of each action were like this.

i) executionTimeMillis : 4541,
sort - sort_key_generator - collscan

ii) executionTimeMillis : 35878
fetch - ixscan

iii) executionTimeMillis : 38614
fetch - ixscan

You can see no index took longer.

My question was this.

How does Index affect the Sort to show this result?

Stennie_X · August 27, 2020, 12:12pm

Hi @Kim_Hakseon,

My response was not copied from the documentation, if that is what you are asking. I did create it “manually” (in the English sense of putting words together myself ;-)) and included some links to relevant topics in the MongoDB documentation.

My question was this.

How does Index affect the Sort to show this result?

I provided a general description of the processing differences to address your original question, which did not include any mention of specific queries. Hopefully that is helpful for forming a mental model of what happens with an in-memory sort.

Comparing specific benchmark outcomes is a different discussion and needs more details about your testing methodology and results.

Please provide more details for your testing scenario:

How many times did you run your queries? Running a query will load required indexes and matching result documents into the working set in the WiredTiger cache. Subsequent queries will be faster because the cache has been primed by the first query.
What is the output of explain(true) for each of those queries? The detailed Explain Output has essential information about the processing work involved if you are trying to understand performance.

I recommend watching @Christopher_Harris’ talk from the recent MongoDB .live conference for a practical guide to understanding and tuning query performance: Tips and Tricks for Query Performance: Let Us .explain() Them.

Regards,
Stennie

Kim_Hakseon · August 28, 2020, 1:19am

It must have been a bit strange to say that I asked the question using a translator, but thank you for answering me.

I will study more and ask you more detailed questions later.

Thank you.

Stennie_X · August 28, 2020, 2:04am

Hi @Kim_Hakseon,

Not a problem. What language are you translating from? Are you using software translation?

Our interaction in the forum is generally in English but perhaps someone in the community can provide clarity on specific aspects that may be less clear with software translation.

Regards,
Stennie

Kim_Hakseon · August 28, 2020, 2:36am

I am translating Korean using Papago.

Ok, thank you~

system · March 29, 2021, 6:40am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.