- Indexes >
- Indexing Overview
Indexing Overview¶
On this page
This document provides an overview of indexes in MongoDB, including index types and creation options. For operational guidelines and procedures, see the Indexing Operations document. For strategies and practical approaches, see the Indexing Strategies document.
Synopsis¶
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specified fields. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
MongoDB indexes have the following core features:
MongoDB defines indexes on a per-collection level.
You can create indexes on a single field or on multiple fields using a compound index.
Indexes enhance query performance, often dramatically. However, each index also incurs some overhead for every write operation. Consider the queries, the frequency of these queries, the size of your working set, the insert load, and your application’s requirements as you create indexes in your MongoDB environment.
All MongoDB indexes use a B-tree data structure. MongoDB can use these representation of the data to optimize query responses.
Every query, including update operations, use one and only one index. The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type. You can override the query optimizer using the
cursor.hint()
method.An index “covers” a query if:
- all the fields in the query are part of that index, and
- all the fields returned in the documents that match the query are in the same index.
When an index covers a query, the server can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. Querying the index can be faster than querying the documents outside of the index.
See Create Indexes that Support Covered Queries for more information.
Using queries with good index coverage reduces the number of full documents that MongoDB needs to store in memory, thus maximizing database performance and throughput.
If an update does not change the size of a document or cause the document to outgrow its allocated area, then MongoDB will update an index only if the indexed fields have changed. This improves performance. Note that if the document has grown and must move, all index keys must then update.
Index Types¶
This section enumerates the types of indexes available in MongoDB.
For all collections, MongoDB creates the default _id index. You can create additional indexes with the
ensureIndex()
method on any
single field or sequence of fields within
any document or sub-document. MongoDB also
supports indexes of arrays, called multi-key indexes.
_id Index¶
The _id
index is a unique index
[1] on the _id
field, and MongoDB creates this
index by default on all collections. [2]
You cannot delete the index on _id
.
The _id
field is the primary key for the collection, and
every document must have a unique _id
field. You may store any
unique value in the _id
field. The default value of _id
is
ObjectId on every insert() <db.collection.insert()` operation. An ObjectId
is a 12-byte unique identifiers suitable for use as the value of an
_id
field.
Note
In sharded clusters, if you do not use
the _id
field as the shard key, then your application
must ensure the uniqueness of the values in the _id
field
to prevent errors. This is most-often done by using a standard
auto-generated ObjectId.
[1] | Although the index on _id is unique,
the getIndexes() method will
not print unique: true in the mongo shell. |
[2] | Before version 2.2 capped collections did not
have an _id field. In 2.2, all capped collections
have an _id field, except those in the local database.
See the release notes
for more information. |
Secondary Indexes¶
All indexes in MongoDB are secondary indexes. You can create indexes on any field within any document or sub-document. Additionally, you can create compound indexes with multiple fields, so that a single query can match multiple components using the index while scanning fewer whole documents.
In general, you should create indexes that support your primary, common, and user-facing queries. Doing so requires MongoDB to scan the fewest number of documents possible.
In the mongo
shell, you can create an index by calling the
ensureIndex()
method.
Arguments to ensureIndex()
resemble the following:
For each field in the index specify either 1
for an
ascending order or -1
for a descending order, which represents the
order of the keys in the index. For indexes with more than one key (i.e.
compound indexes) the sequence of fields is
important.
Indexes on Sub-documents¶
You can create indexes on fields that hold sub-documents as in the following example:
Example
Given the following document in the factories
collection:
You can create an index on the metro
key. The following queries would
then use that index, and both would return the above document:
The second query returns the document because { city: "New York"
}
is less than { city: "New York", state: "NY" }
The order of
comparison is in ascending key order in the order the keys occur in
the BSON document.
Indexes on Embedded Fields¶
You can create indexes on fields in sub-documents, just as you can index top-level fields in documents. [3] These indexes allow you to use a “dot notation,” to introspect into sub-documents.
Consider a collection named people
that holds documents that resemble
the following example document:
You can create an index on the address.zipcode
field, using the
following specification:
[3] | Indexes on Sub-documents, by contrast
allow you to index fields that hold documents, including the full
content, up to the maximum Index Size of the sub-document
in the index. |
Compound Indexes¶
MongoDB supports “compound indexes,” where a single index structure
holds references to multiple fields within a collection’s
documents. Consider a collection named products
that holds documents
that resemble the following document:
If most applications queries include the item
field and a
significant number of queries will also check the stock
field, you
can specify a single compound index to support both of these queries:
Compound indexes support queries on any prefix of the fields in the
index. [4] For example, MongoDB can use the above index to
support queries that select the item
field and to support queries
that select the item
field and the location
field. The
index, however, would not support queries that select the following:
- only the
location
field - only the
stock
field - only the
location
andstock
fields - only the
item
andstock
fields
When creating an index, the number associated with a key specifies the
direction of the index. The options are 1
(ascending) and -1
(descending). Direction doesn’t matter for single key indexes or for
random access retrieval but is important if you are doing sort
queries on compound indexes.
The order of fields in a compound index is very important. In the
previous example, the index will contain references to documents
sorted first by the values of the item
field and, within each
value of the item
field, sorted by the values of location
, and
then sorted by values of the stock
field.
[4] | Index prefixes are the beginning subset of fields. For
example, given the index { a: 1, b: 1, c: 1 } both { a: 1 }
and { a: 1, b: 1 } are prefixes of the index. |
Indexes with Ascending and Descending Keys¶
Indexes store references to fields in either ascending or descending order. For single-field indexes, the order of keys doesn’t matter, because MongoDB can traverse the index in either direction. However, for compound indexes, if you need to order results against two fields, sometimes you need the index fields running in opposite order relative to each other.
To specify an index with a descending order, use the following form:
More typically in the context of a compound index, the specification would resemble the following prototype:
Consider a collection of event data that includes both usernames and a timestamp. If you want to return a list of events sorted by username and then with the most recent events first. To create this index, use the following command:
Multikey Indexes¶
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a “multikey index.”
Example
Given the following document:
Then an index on the tags
field would be a multikey index and
would include these separate entries:
Queries could use the multikey index to return queries for any of the above values.
You can use multikey indexes to index fields within objects embedded in arrays, as in the following example:
Example
Consider a feedback
collection with documents in the following
form:
An index on the comments.text
field would be a multikey index
and would add items to the index for all of the sub-documents in
the array.
With an index, such as { comments.text: 1 }
, consider the
following query:
This would select the document, that contains the following
document in the comments.text
array:
Compound Multikey Indexes May Only Include One Array Field
While you can create multikey compound indexes, at most one field in a compound index may hold
an array. For example, given an index on { a: 1, b: 1 }
, the
following documents are permissible:
However, the following document is impermissible, and MongoDB
cannot insert such a document into a collection with the {a: 1,
b: 1 }
index:
If you attempt to insert a such a document, MongoDB will reject the
insertion, and produce an error that says cannot index parallel
arrays
. MongoDB does not index parallel arrays because they
require the index to include each value in the Cartesian product of
the compound keys, which could quickly result in incredibly large
and difficult to maintain indexes.
Unique Indexes¶
A unique index causes MongoDB to reject all documents that
contain a duplicate value for the indexed field. To create a unique index
on the user_id
field of the members
collection, use the
following operation in the mongo
shell:
By default, unique
is false
on MongoDB indexes.
If you use the unique constraint on a compound index then MongoDB will enforce uniqueness on the combination of values, rather than the individual value for any or all values of the key.
If a document does not have a value for the indexed field in a unique index, the index will store a null value for this document. MongoDB will only permit one document without a unique value in the collection because of this unique constraint. You can combine with the sparse index to filter these null values from the unique index.
Sparse Indexes¶
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed. The index is “sparse” because of the missing documents when values are missing.
By contrast, non-sparse indexes contain all documents
in a collection, and store null values for documents that do not
contain the indexed field. Create a sparse index on the xmpp_id
field, of the members
collection, using the following operation in
the mongo
shell:
By default, sparse
is false
on MongoDB indexes.
Warning
Using these indexes will sometimes result in incomplete results when filtering or sorting results, because sparse indexes are not complete for all documents in a collection.
Note
Do not confuse sparse indexes in MongoDB with block-level indexes in other databases. Think of them as dense indexes with a specific filter.
You can combine the sparse index option with the unique
indexes option so that mongod
will
reject documents that have duplicate values for a field, but that
ignore documents that do not have the key.
[5] | All documents that have the indexed field are indexed in a sparse index, even if that field stores a null value in some documents. |
Index Creation Options¶
You specify index creation options in the second argument in
ensureIndex()
.
The options sparse, unique, and TTL affect the kind of index that MongoDB creates. This section addresses, background construction and duplicate dropping, which affect how MongoDB builds the indexes.
Background Construction¶
By default, creating an index is a blocking operation. Building an
index on a large collection of data can take a long
time to complete. To resolve this issue, the background option can
allow you to continue to use your mongod
instance during
the index build.
For example, to create an index in the background of the zipcode
field of the people
collection you would issue the following:
By default, background
is false
for building MongoDB indexes.
You can combine the background option with other options, as in the following:
Be aware of the following behaviors with background index construction:
A
mongod
instance can only build one background index per database, at a time.Changed in version 2.2: Before 2.2, a single
mongod
instance could only build one index at a time.The indexing operation runs in the background so that other database operations can run while creating the index. However, the
mongo
shell session or connection where you are creating the index will block until the index build is complete. Open another connection ormongo
instance to continue using commands to the database.The background index operation use an incremental approach that is slower than the normal “foreground” index builds. If the index is larger than the available RAM, then the incremental process can take much longer than the foreground build.
If your application includes
ensureIndex()
operations, and an index doesn’t exist for other operational concerns, building the index can have a severe impact on the performance of the database.Make sure that your application checks for the indexes at start up using the
getIndexes()
method or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance windows.
Building Indexes on Secondaries
Background index operations on a replica set primary become foreground indexing operations on secondary members of the set. All indexing operations on secondaries block replication.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary.
Remember, the amount of time required to build the index on a secondary node must be within the window of the oplog, so that the secondary can catch up with the primary.
See Build Indexes on Replica Sets for more information on this process.
Indexes on secondary members in “recovering” mode are always built in the foreground to allow them to catch up as soon as possible.
See Build Indexes on Replica Sets for a complete procedure for rebuilding indexes on secondaries.
Note
If MongoDB is building an index in the background, you cannot
perform other administrative operations involving that collection,
including repairDatabase
, drop that collection
(i.e. db.collection.drop()
,) and
compact
. These operations will return an error during
background index builds.
Queries will not use these indexes until the index build is complete.
Drop Duplicates¶
MongoDB cannot create a unique index on a
field that has duplicate values. To force the creation of a unique
index, you can specify the dropDups
option, which will only index
the first occurrence of a value for the key, and delete all subsequent
values.
Warning
As in all unique indexes, if a document does not have the indexed field, MongoDB will include it in the index with a “null” value.
If subsequent fields do not have the indexed field, and you have
set {dropDups: true}
, MongoDB will remove these documents from
the collection when creating the index. If you combine dropDups
with the sparse option, this index will
only include documents in the index that have the value, and the
documents without the field will remain in the database.
To create a unique index that drops duplicates on the username
field of the accounts
collection, use a command in the following form:
Warning
Specifying { dropDups: true }
will delete data from your
database. Use with extreme caution.
By default, dropDups
is false
.
Index Features¶
TTL Indexes¶
TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time. This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time.
These indexes have the following limitations:
- Compound indexes are not supported.
- The indexed field must be a date type.
- If the field holds an array, and there are multiple date-typed data in the index, the document will expire when the lowest (i.e. earliest) matches the expiration threshold.
Note
TTL indexes expire data by removing documents in a background task that runs every 60 seconds. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that:
- Documents may remain in a collection after they expire and before the background process runs.
- The duration of the removal operations depend on the workload of
your
mongod
instance.
In all other respects, TTL indexes are normal indexes, and if appropriate, MongoDB can use these indexes to fulfill arbitrary queries.
Geospatial Indexes¶
MongoDB provides “geospatial indexes” to support location-based and other similar queries in a two dimensional coordinate systems. For example, use geospatial indexes when you need to take a collection of documents that have coordinates, and return a number of options that are “near” a given coordinate pair.
To create a geospatial index, your documents must
have a coordinate pair. For maximum compatibility, these coordinate
pairs should be in the form of a two element array, such as [ x , y
]
. Given the field of loc
, that held a coordinate pair, in the
collection places
, you would create a geospatial index as follows:
MongoDB will reject documents that have values in the loc
field
beyond the minimum and maximum values.
Note
MongoDB permits only one geospatial index per collection. Although, MongoDB will allow clients to create multiple geospatial indexes, a single query can use only one index.
See the $near
, and the database command
geoNear
for more information on accessing geospatial
data.
Geohaystack Indexes¶
In addition to conventional geospatial indexes, MongoDB also provides a bucket-based geospatial index, called “geospatial haystack indexes.” These indexes support high performance queries for locations within a small area, when the query must filter along another dimension.
Example
If you need to return all documents that have coordinates within 25 miles of a given point and have a type field value of “museum,” a haystack index would be provide the best support for these queries.
Haystack indexes allow you to tune your bucket size to the distribution of your data, so that in general you search only very small regions of 2d space for a particular kind of document. These indexes are not suited for finding the closest documents to a particular location, when the closest documents are far away compared to bucket size.
Index Behaviors and Limitations¶
Be aware of the following behaviors and limitations:
A collection may have no more than 64 indexes.
Index keys can be no larger than 1024 bytes.
Documents with fields that have values greater than this size cannot be indexed.
To query for documents that were too large to index, you can use a command similar to the following:
The name of an index, including the namespace must be shorter than 128 characters.
Indexes have storage requirements, and impacts insert/update speed to some degree.
Create indexes to support queries and other operations, but do not maintain indexes that your MongoDB instance cannot or will not use.
For queries with the
$or
operator, each clause of an$or
query executes in parallel, and can each use a different index.For queries that use the
sort()
method and use the$or
operator, the query cannot use the indexes on the$or
fields.2d
geospatial queries do not support queries that use the$or
operator.