Index selection

Habeeb_Raja · April 26, 2022, 8:34am

Hi All, We are having below indexes

IX_1 : {‘A’:1, ‘B’:1} size: 10 MB
IX_2 : {‘A’:1, ‘C’:1} size: 50 MB

Query 1: Search by column ‘A’
Query 2: Search by column ‘A’ and ‘X’

In these cases how the index will be selected?

Please explain…

steevej · April 26, 2022, 1:15pm

There are many factors that may influence which one is used by the query planner. I think it might even vary from one instance of a query. If I were to implement the query planner and I find that 2 indexes are suitable, I would use the one that is already in cache rather than load another one.

@ time 1 - I receive Q1 and IX1 is in cache, I would use IX1 because it is there and it serves the query well
@ time 2 - I receive Q3 that involves a query on A and C, then I am forced to flush out IX1 and load IX2
@ time 3 - I receive another Q1, but now IX2 is in cache, I would use IX2 rather than IX1 like last time because it involves less work. It would be bad to flush IX2 to load and use IX1 like last time because I might need to reload IX2 @ time 4 for another Q3.

But both are suitable and you should not worry about which one.

Why is that important for you to know?

Habeeb_Raja · April 26, 2022, 2:07pm

Thanks @steevej . Can you share the other factors that influence the index selection. The reason for this topic is, we are re-arranging the current indexes in our system. In order to make it more efficient, we try to understand this.

Thanks,
Habeeb

steevej · April 26, 2022, 3:09pm

The best source to help you selecting your indexes is M201.

But the most important part is your use-cases.

If RAM is no issue, all indexes for all your use-cases. But too many indexes slows down updates since indexes have to be updated.

If tight on RAM, then keep the smallest indexes, unless queries with bigger index are more frequent.

Analytics queries might require different set of indexes or can be done without index despite the slowness because they are less frequent.

The equality-sort-range rule is also important.

The question is why

Performance issues can be solved by have a better schema. For example, the computed pattern can be used on very frequent queries done on stable data.