How many indexes are needed for a count query to be effective?

Let’s say we have a collection with million of documents, all with the following shape:

{
   _id: // ObjectId
   color: "green",
   size: 34,
   category: "1-1-1"
}

I have the following requirements for counting the documents:

countDocuments({color: 'green'})
countDocuments({category: '1-1-1'})
countDocuments({size: 34})
countDocuments({color: 'green', category: '1-1-1'})
countDocuments({color: 'green', category: '1-1-1', size: 34})

How many indexes do I need to count the docs effectively?

Thanks for the help in advance :slight_smile:

Hello @Alex_Bjorlig,

You will require at least three indexes to support the above five queries. Those would be single field indexes on fields color , category and size respectively. These three indexes can support the five queries you had posted.

The second alternative is instead of the single field index on color, you can create a compound index on one (or both) of these - color+category and color+category+size.

But, there are couple of things to consider:

  • Indexes require disk space and when they are to used will be loaded in to the RAM memory - these are resources. More indexes and compound indexes can consume these resources and can affect the performance.
  • Another aspect is that the most important queries should be considered when creating your indexes - all queries may not need indexes.
  • Consider the indexing strategy of Selectivity before creating the compound indexes.

You will need some test data, create these indexes and try your queries. Then measure their usage and performance - for example, you can use explain to generate the query plans and study them.

I don’t think you can run explain on the countDocuments method, but countDocuments - mechanics says that underlying this method is the aggregation query:

db.collection.aggregate([
   { $match: <query> },
   { $group: { _id: null, n: { $sum: 1 } } }
])

And, you can run explain on the aggregation query.

1 Like

Hi @Prasad_Saya

Thanks for providing the detailed answer. I will continue to monitor the queries. Currently I feel a bit limited by the fact that I can not use .explain() in compas on aggregations. And when I use the explain() on aggregations in code, I need to read the raw json manually. But I hope this will change in the near future :sweat_smile:

Hi @Prasad_Saya,

A compound index on multiple fields can support all the queries that search a prefix subset of those fields, so the most efficient option to cover the three queries including color would be a single compound index on color+category+size.

If this compound index exists, you would want to drop any prefix indexes like color or color+category as they would add unnecessary overhead.

All five queries would optimally be supported by three indexes:

  • { color: 1, category: 1, size: 1 }
  • { category: 1 }
  • { size: 1 }

There is some further information in Create Compound Indexes to Support Several Different Queries.

Regards,
Stennie

2 Likes