Prefix index performance for large datasets

Hello all,
how is the query performance impacted when using prefix index versus index specifically for that query? Example:
I have 2 different queries.
For the first query I can use this index: field1_-1. (takes up 100 MB for millions of documents)
For the second query I can use this compound index: field1_-1_field2_-1_field3_-1_field4_1 (takes up 1 GB for millions of documents).

So my question is:
Is it still useful to use prefix index (e.g. use the second index for the first query), when the second index takes up much more space and when querying, will be using more RAM for the first query than it would using the first index? Or is the space the indexes take up does not indicate that there will be higher usage of RAM, therefore slower query, where unnecessary? (there is no concern about write speeds, only for reading purposes)

Hi @Julius_65872,

I would imagine that both would be fairly comparable as long as there’s enough RAM to handle the index and working set. Index 1 would have a slight edge over Index 2 for Query 1. Suggest you test it out using hint() and the Explain Plan. Something like this:
var expl = db.collection.find({query2}).hint("index_1").explain("allPlansExecution")

Based on your stats, field1 is taking up only ~10% of space in Index2 which indicates that there are one or more fields that’s consuming more than ~20% of the space. So I wonder if this index would benefit from a partial filter expression if possible?

The RAM usage is dependent on whether it’s a covered query (i.e. fetch docs from the index alone), the index size and the working set. Ensure that you have enough memory to fit an entire index + working set. Then after benchmarking and optimising, you’ll be able to decide whether to keep or drop Index1.

2 Likes

I guess the index already in RAM and the return values will take more RAM

Thank you for your insights!