Issue in Answer Description - Lab 4.2: Aggregation Performance

In this lab, we are presented with a query that fails because it requires too much memory to sort, and we want to solve it by adding one index with the minimal number of fields possible.

Keep in mind that there might be several indexes that resolve this error, but we’re looking for an index with the smallest number of fields that resolves the error and services this command.

Seeing that, I submit { stars: 1 }, which is accepted as the correct solution. Based on the problem description, this is correct, as it is the minimal number of fields required to fix the error and make the command work.

However, I disagree with the underlined part of the answer description

Other indexes could also be used like db.restaurants.createIndex({ stars: 1, cuisine: 1}) however, if we are looking for most effective index to support our aggregation command, { stars: 1 } is the optimal option.

With { stars: 1, cuisine: 1}, the query planner does a PROJECTION_COVERED after IXSCAN and runs faster than with the other index, which results in a PROJECTION_SIMPLE after FETCH after IXSCAN. The extra field in the index allows for a covered query, removing the need for the fetch and speeding up the query.

With the { stars: 1 } index, the explain output shows executionTimeMillis: 1260, but with { stars: 1, cuisine: 1}, it shows executionTimeMillis: 613.

1 Like

The key part of the requirement:

The index starts:1 has 1 field while the index starts:1,cuisine:1 has 2 fields. So the former has the smallest number of fields.

If you run a query twice in a row, the potential to have the second execution being faster is high as the documents is probably read from disk only the first time. The second time around the document is in RAM and there is no I/O.

Please do not rely on execution time when running an Atlas free tier as the hardware is shared among many instances. If you want to take comparative measurement do it on a local instance or a non-shared Atlas instance.

I agree that 1 field is smaller than 2 fields and meets

the smallest number of fields that resolves the error and services this command

However, it is not the

most effective index to support our aggregation command

With the { stars: 1, cuisine: 1} index, the document does not even need to be read, since MongoDB gets all of the data it needs from the index (covered query). This is the reason why it is faster than with { stars: 1 }.

I created the first index and ran the first query a few times, and the first run took about 1400ms, while subsequent runs took about 1250ms, and I reported one of those measurements in my original post. I then created the second index, ran the second query a few times, and reported one of the later measurements. It is clearly faster than the first query, and it cannot be explained by caching. The explain output even shows it choosing the plan using the compound index and rejecting the plan using the single-field index. I could repeat the experiment on a local instance or non-shared Atlas cluster, but the result would likely be the same.

The fix would be to update the answer description to explain that { stars: 1 } is the best index to meet the question requirements (“fix the issue with 1 index, with the minimal number of fields”), while { stars: 1, cuisine: 1 } is the best index for maximum query performance.

2 Likes

I agree with your assessment @victorz .

In my case I created a variety of indexes as follows:

    var indexA = { 'stars': 1 }
    // stars_1

    var indexB = { 'stars': 1, 'cuisine': 1}
    // stars_1_cuisine_1 --- WINNING_PLAN, 860ms 

    var indexC = { 'cuisine': 1, 'stars': 1 }
    // cuisine_1_stars_1

    var indexD = { 'stars': -1, 'cuisine': 1 }
    // stars_-1_cuisine_1

So I created each of the above indexes for comparison (eg) db.restaurants.createIndex(indexA), etc.

Then I executed the query passing the { explain: true } option as follows:

const results = db.restaurants.aggregate(pipeline, { explain: true })

As no other query was executed previously (as you suggested might impact subsequent queries @steevej ) then how is it that when inspecting the above ‘results’ for the WINNING_PLAN we see that indexB was chosen?

Seems like the requirements should state the desired result should be the fastest query with the fewest number of fields? Without suggestion of looking for “the most effective index to support our aggregation command”