Collection index creation order matters?

Svarup_Desai · January 30, 2023, 8:18am

Suppose I have one collection called test which has a below doc. Just for simplicity, it has one doc.

{
  "_id":  "63d771990ba1354fef3577c2",
  "project": "63a2f3537844f79dd4fe7234",
  "email": "user1@example.com"
}

Below is the doc schema

{
  "_id": ObjectId,
  "project": ObjectId,
  "email": String
}

Below are the indexes that I have created in test collection.

_id (default)
{ "email": 1 }
{ "project": 1, "email": 1 } (this one is compound index)

Now if I run the below query it uses 3rd index (compound index) and that is correct because we have project field in the query and that belongs to the 3rd index.

{ 
  project: ObjectId('63a2f3537844f79dd4fe7234')
}

Now if I run the below query why it uses the 2nd index ({ "email": 1 }) why not the 3rd compound index? why it is not using the 3rd compound index even though the query contains the compound index prefix project and also the email field?

{ 
  project: ObjectId('63a2f3537844f79dd4fe7234'),
  email: 'user1@example.com'
}

Now if I remove the 2nd index { "email": 1 } then we have only two indexes in test collection as below.

_id (default)
{ "project": 1, "email": 1 } (a compound index)

Now if I create the same index { "email": 1 } again then we have the below indexes.

_id (default)
{ "project": 1, "email": 1 } (a compound index)
{ "email": 1 }

Now if I run the same below query again then it will use the 2nd compound index { "project": 1, "email": 1 }.

{ 
  project: ObjectId('63a2f3537844f79dd4fe7234'),
  email: 'user1@example.com'
}

Why for the same above query if we have indexes as below

_id (default)
{ "email": 1 }
{ "project": 1, "email": 1 } (a compound index)

then it will use { "email": 1 } index to find documents

and if we have the below indexes

_id (default)
{ "project": 1, "email": 1 } (a compound index)
{ "email": 1 }

then it will use { "project": 1, "email": 1 } index to find documents?

Kushagra_Kesav · February 3, 2023, 2:14am

Hi @Svarup_Desai,

Welcome to the MongoDB Community forums

The query optimizer in MongoDB chooses the index with the best performance. When multiple indexes are available, the query optimizer will evaluate the relative cost of each index and choose the index that provides the most selective results with the least amount of I/O cost.

Here you are trying to execute the query:

{ 
  project: ObjectId('63a2f3537844f79dd4fe7234'),
  email: 'user1@example.com'
}

From my experiments with the scenario you posted, both indexes perform equally well so technically it doesn’t matter which one was chosen.

You can confirm it by using explain() results to see the output of allPlansExecution.

For index email:1

        score: 2.5002,
        executionStages: {
          stage: 'FETCH',
          filter: { project: { '$eq': -1960449405 } },
          nReturned: 1,
          executionTimeMillisEstimate: 0,
          works: 2,

For index project:1, email:1

        score: 2.5002,
        executionStages: {
          stage: 'FETCH',
          nReturned: 1,
          executionTimeMillisEstimate: 0,
          works: 2,

So it chooses either index arbitrarily as the score for both indexes is the same.

However, when you do the same query with a large collection of documents the query optimizer may change with the size and complexity of the data. With a larger collection, the optimizer may have more information about the data distribution and select a different index based on this information. This is because the optimizer’s goal is to choose the index that will provide the most efficient way to resolve the query and this can change based on the size and complexity of the data.

Like I tried with one collection which is sample collection sample_mflix.comments having 41.1K documents and I created two indexes very similar to yours:

After running two queries similar to the one you described above, the results were not the same. The first query

{ 
movie_id: ObjectId('573a1395f29313caabce1855'), 
email: "theresa_holmes@fakegmail.com" 
}

utilized the movie_id_1_email_1 index. However, even after reordering the indexes by deleting and recreating them, the same movie_id_1_email_1 index was utilized.

I hope it helps!

Let us know if you have any further questions!

Best,
Kushagra

Svarup_Desai · February 3, 2023, 4:33am

Thank you, @Kushagra_Kesav for the explanation,

Now I understand it, MongoDB does not stick with the same index for any single query. The query optimizer in MongoDB chooses the best available index dynamically based on the size and complexity of the data, so in my example, all was happening because of low data.

system · February 9, 2023, 10:13am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.