Compound index - skipping a prefix vs selecting all values of a prefix

{
  field_1: "string" // can only have the value of "A" or "B",
  field_2: "numeric",
}

The above is the schema for my collection.

The following compound index exists:

{
  field_1: 1,
  field_2: 1
}

The query in question is below:

db.col.find( { field_2: { $gt: 100 } } )

This query skips the prefix field_1. Hence MongoDB does not use the compound index.

So in order to get it to use the compound index, I change the query to this:

db.col.find( { field_1: { $in: ["A", "B"] }, field_2: { $gt: 100 } } )
  1. Would MongoDB use the compound index in the second query?
  2. Would there be any performance benefits either way?

Hi,

Yes, since your query includes all fields (or a prefix) of the compound index.

You can verify index selection using the query explain() feature.

Your compound index doesn’t get selected in the first query because all values of field_1 need to be scanned, which will likely have a high ratio of index keys read compared to results returned.

Maintaining and scanning extra index keys will have negative performance impact, but the observed outcome will depend on your environment and resources. There’s some more background in the Unnecessary Indexes schema design anti-pattern.

If you sometimes need to query on both fields but mostly query on field_2, I recommend reversing the order of fields in your compound index so your first query will match the index prefix. Alternatively you could create an index on just field_2.

Regards,
Stennie

1 Like
Your compound index doesn’t get selected in the first query because all values of field_1 need to be scanned, which will likely have a high ratio of index keys read compared to results returned.

I am confused about the above statement.

  1. Why would all values of field_1 need to be scanned? field_1 is the prefix of the compound index.
  2. Can you explain what it means to have a high ratio of index keys read compared to results returned? If field_1 holds a binary value, then there are only two index keys? Hence the ratio of keys read to results returned must be low? I need a concrete example of this.