IXSCAN with LIMIT returns unexpected result

DongYoung_Lee · May 24, 2023, 7:58am

Hello there.

I’m using MongoDB 4.4 version and i have some questions with executionStats result.

Query

db.collection.find({
  feature: { $in: [ "buy", "sell", "ship", "contact" ] },
  date_time: { $gt: new ISODate(~~) },
  _id: { $gt: ObjectId(~~) }
})
.sort({ date_time: 1 })
.limit(limit size)

Index

id
feature_1_date_time_1

Stages
IXSCAN (multiple stages) → SORT_MERGE → FETCH (nReturned limit_size) → LIMIT (nReturned limit_size)
(not real plan)

First - I expect my query scans always same number of index keys but with limits, there are difference with scanned index keys in IXSCAN stage. Are limit aggregation affects to ixscan stage?

Second - In executionStats result, FETCH stage filters _id field so i expect filtering works in this stage. But when i change comparision value of _id field in query, IXSCAN and SORT_MERGE stages has different nReturned value with previous _id value.
It doesn’t occurs when i don’t use limit operation. Why?

Third - FETCH nReturned and LIMIT nReturned is same. I expected that FETCH returns more that limit_size and LIMIT returns exactly limit_size docs but it seems that it’s already applied in FETCH stage.
Did i missed something about how index works?

Thanks for reading.

kevinadi · May 29, 2023, 2:12am

Hi @DongYoung_Lee welcome to the community!

As I understand it, you observe that the limit was enforced in the last part of the explain output, instead of the early part (during the IXSCAN phase). Is this correct?

If yes, then this is expected. When you execute a query with limit, it process the whole result set first, then limit the returned output. It doesn’t limit it at the start, since they will return a different result set.

For example, if I have this collection:

> db.test.find()
[
  { _id: 0, a: 0 },
  { _id: 1, a: 0 },
  { _id: 2, a: 1 },
  { _id: 3, a: 1 },
  { _id: 4, a: 2 },
  { _id: 5, a: 2 }
]

Illustrating using aggregation, if I set $group → $limit:

> db.test.aggregate([ {$group:{_id:'$a', sum:{$sum:1}}}, {$limit:2} ])
[ { _id: 0, sum: 2 }, { _id: 1, sum: 2 } ]

Now if I do $limit → $group:

> db.test.aggregate([ {$limit:2}, {$group:{_id:'$a', sum:{$sum:1}}} ])
[ { _id: 0, sum: 2 } ]

The results are not the same. This is because in the first case, it process the required data (using $group), then limit the output to only 2 documents. In the second case, it limits the input to the $group stage.

Could you post the the different explain plain output you observed here? I suspect the different nReturned value is due to the actual documents in the collection.

I believe this is deliberate. The whole query including the limit is visible to the query engine. If the server FETCH more data and throw them away due to the presence of LIMIT, it does wasted work. It’ll be much better to FETCH what’s only required, especially since FETCH is frequently an expensive process.

If you need more information, could you post some example data, and the explain output of the queries you’re trying to do as well?

Best regards
Kevin

DongYoung_Lee · July 18, 2023, 11:28pm

Hi @kevinadi . Thanks for your replying.

In short, i totally misunderstood how mongo works with index and meaning of nReturned explain results.

Answering to myself,
First and Third
Whole stages does work with documents one by one so, ‘limit’ makes not to do wasted work as you said.

Second
@kevinadi was right. It was because of the actual documents’ distribution and misusage of the query not fits with indexes.

Thank you