- Indexes >
- Indexing Strategies
Indexing Strategies¶
On this page
- Strategies
- Create Indexes to Support Your Queries
- Use Compound Indexes to Support Several Different Queries
- Create Indexes that Support Covered Queries
- Use Indexes to Sort Query Results
- Ensure Indexes Fit RAM
- Create Queries that Ensure Selectivity
- Consider Performance when Creating Indexes for Write-heavy Applications
This document provides strategies for indexing in MongoDB. For fundamentals of MongoDB indexing, see Indexing Overview. For operational guidelines and procedures, see Indexing Operations.
Strategies¶
The best indexes for your application are based on a number of factors, including the kinds of queries you expect, the ratio of reads to writes, and the amount of free memory on your system.
When developing your indexing strategy you should have a deep understanding of:
- The application’s queries.
- The relative frequency of each query in the application.
- The current indexes created for your collections.
- Which indexes the most common queries use.
The best overall strategy for designing indexes is to profile a variety of index configurations with data sets similar to the ones you’ll be running in production to see which configurations perform best.
MongoDB can only use one index to support any given
operation. However, each clause of an $or
query may use
a different index.
Create Indexes to Support Your Queries¶
If you only ever query on a single key in a given collection, then you need
to create just one single-key index for that collection. For example, you
might create an index on category
in the product
collection:
However, if you sometimes query on only one key and at other times
query on that key combined with a second key, then creating a
compound index is more efficient. MongoDB
will use the compound index for both queries. For example, you might
create an index on both category
and item
.
This allows you both options. You can query on just category
, and
you also can query on category
combined with item
.
(To query on multiple keys and sort the results, see Use Indexes to Sort Query Results.)
With the exception of queries that use the $or
operator, a
query does not use multiple indexes. A query uses only one index.
Use Compound Indexes to Support Several Different Queries¶
A single compound index on multiple fields can support all the queries that search a “prefix” subset of those fields.
Example
The following index on a collection:
Can support queries that the following indexes support:
There are some situations where the prefix indexes may offer better
query performance: for example if z
is a large array.
The { x: 1, y: 1, z: 1 }
index can also support many of the same
queries as the following index:
Also, { x: 1, z: 1 }
has an additional use. Given the following
query:
The { x: 1, z: 1 }
index supports both the query and the sort
operation, while the { x: 1, y: 1, z: 1 }
index only supports
the query. For more information on sorting, see
Use Indexes to Sort Query Results.
Create Indexes that Support Covered Queries¶
A covered query is a query in which:
- all the fields in the query are part of an index, and
- all the fields returned in the results are in the same index.
Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query.
Querying only the index can be much faster than querying documents outside of the index. Index keys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk.
MongoDB automatically uses an index that covers a query when possible.
To ensure that an index can cover a query, create an index that
includes all the fields listed in the query document and in the query result. You can
specify the fields to return in the query results with a
projection document. By default, MongoDB includes
the _id
field in the query result. So, if the index does not
include the _id
field, then you must exclude the _id
field
(i.e. _id: 0
) from the query results.
Consider the following example where the collection user
has
an index on the fields user
and status
:
Then, the following query which queries on the status
field and
returns only the user
field is covered:
However, the following query that uses the index to match documents is
not covered by the index because it returns both the user
field
and the _id
field:
An index cannot cover a query if:
any of the indexed fields in any of the documents in the collection includes an array. If an indexed field is an array, the index becomes a multi-key index index and cannot support a covered query.
any of the indexed fields are fields in subdocuments. To index fields in subdocuments, use dot notation. For example, consider a collection
users
with documents of the following form:The collection has the following indexes:
The
{ user: 1 }
index covers the following query:However, the
{ "user.login": 1 }
index does not cover the following query:The query, however, does use the
{ "user.login": 1 }
index to find matching documents.
To determine whether a query is a covered query, use the
explain()
method. If the explain()
output displays true
for the indexOnly
field, the query is
covered by an index, and MongoDB queries only that index to match the
query and return the results.
For more information see Measure Index Use.
Use Indexes to Sort Query Results¶
For the fastest performance when sorting query results by a given field, create a sorted index on that field.
To sort query results on multiple fields, create a compound index. MongoDB sorts results based on the field order in the index. For queries that include a sort that uses a compound index, ensure that all fields before the first sorted field are equality matches.
Example
If you create the following index:
The following query and sort operations can use the index:
However, the following queries cannot sort the results using the index:
Ensure Indexes Fit RAM¶
For the fastest processing, ensure that your indexes fit entirely in RAM so that the system can avoid reading the index from disk.
To check the size of your indexes, use the
db.collection.totalIndexSize()
helper, which returns data in
bytes:
The above example shows an index size of almost 4.3 gigabytes. To ensure this index fits in RAM, you must not only have more than that much RAM available but also must have RAM available for the rest of the working set. Also remember:
If you have and use multiple collections, you must consider the size of all indexes on all collections. The indexes and the working set must be able to fit in memory at the same time.
There are some limited cases where indexes do not need to fit in memory. See Indexes that Hold Only Recent Values in RAM.
See also
For additional collection statistics, use collStats
or
db.collection.stats()
.
Indexes that Hold Only Recent Values in RAM¶
Indexes do not have to fit entirely into RAM in all cases. If the value of the indexed field increments with every insert, and most queries select recently added documents; then MongoDB only needs to keep the parts of the index that hold the most recent or “right-most” values in RAM. This allows for efficient index use for read and write operations and minimize the amount of RAM required to support the index.
Create Queries that Ensure Selectivity¶
Selectivity is the ability of a query to narrow results using the index. Effective indexes are more selective and allow MongoDB to use the index for a larger portion of the work associated with fulfilling the query.
To ensure selectivity, write queries that limit the number of possible documents with the indexed field. Write queries that are appropriately selective relative to your indexed data.
Example
Suppose you have a field called status
where the possible values
are new
and processed
. If you add an index on status
you’ve created a low-selectivity index. The index will
be of little help in locating records.
A better strategy, depending on your queries, would be to create a
compound index that includes the
low-selectivity field and another field. For example, you could
create a compound index on status
and created_at.
Another option, again depending on your use case, might be to use separate collections, one for each status.
Example
Consider an index { a : 1 }
(i.e. an index on the key a
sorted in ascending order) on a collection where a
has three
values evenly distributed across the collection:
If you query for { a: 2, b: "no" }
MongoDB must scan 3
documents in the collection to return the one
matching result. Similarly, a query for { a: { $gt: 1}, b: "tv" }
must scan 6 documents, also to return one result.
Consider the same index on a collection where a
has nine values
evenly distributed across the collection:
If you query for { a: 2, b: "cd" }
, MongoDB must scan only one
document to fulfill the query. The index and query are more selective
because the values of a
are evenly distributed and the query
can select a specific document using the index.
However, although the index on a
is more selective, a query such
as { a: { $gt: 5 }, b: "tv" }
would still need to scan 4
documents.
If overall selectivity is low, and if MongoDB must read a number of documents to return results, then some queries may perform faster without indexes. To determine performance, see Measure Index Use.
Consider Performance when Creating Indexes for Write-heavy Applications¶
If your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a write-performance penalty. In general, don’t be careless about adding indexes. Add indexes to complement your queries. Always have a good reason for adding a new index, and be sure to benchmark alternative strategies.
Consider Insert Throughput¶
MongoDB must update all indexes associated with a collection after every insert, update, or delete operation. For update operations, if the updated document does not move to a new location, then MongoDB only modifies the updated fields in the index. Therefore, every index on a collection adds some amount of overhead to these write operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty. However, in some cases:
- An index to support an infrequent query might incur more insert-related costs than savings in read-time.
- If you have many indexes on a collection with a high insert throughput and a number of related indexes, you may find better overall performance with a smaller number of indexes, even if some queries are less optimally supported by an index.
- If your indexes and queries are not sufficiently selective, the speed improvements for query operations may not offset the costs of maintaining an index. For more information see Create Queries that Ensure Selectivity.