- Indexes >
- Index Concepts >
- Index Types >
- Text Indexes
Text Indexes¶
On this page
New in version 2.4.
MongoDB provides text
indexes to support text search of string
content in documents of a collection.
text
indexes can include any field whose value is a string or an
array of string elements. To perform queries that access the text
index, use the $text
query operator.
Changed in version 2.6: MongoDB enables the text search feature by
default. In MongoDB 2.4, you need to enable the text search
feature manually to create text
indexes and perform text search.
Create Text Index¶
To create a text
index, use the
db.collection.ensureIndex()
method. To index a field that
contains a string or an array of string elements, include the field and
specify the string literal "text"
in the index document, as in the
following example:
A collection can have at most one text
index.
However, you can specify multiple fields for the text
index. For
examples of creating text
indexes on multiple fields, see
Create a text Index and
Wildcard Text Indexes.
Wildcard Text Indexes¶
To allow for text search on all fields with string content, use the
wildcard specifier ($**
) to index all fields in the collection that
contain string content. Such an index can be useful with highly
unstructured data if it is unclear which fields to include in the text
index or for ad-hoc querying.
With a wildcard text index, MongoDB indexes every field that contains string data for each document in the collection. The following example creates a text index using the wildcard specifier:
Wildcard text indexes are text
indexes on multiple fields. As such,
you can assign weights to specific fields during index creation to
control the ranking of the results. For more information using weights
to control the results of a text search, see
Control Search Results with Weights.
Wildcard text indexes, as with all text indexes, can be part of a
compound indexes. For example, the following creates a compound index
on the field a
as well as the wildcard specifier:
As with all compound text indexes, since
the a
precedes the text index key, in order to perform a
$text
search with this index, the query predicate must include
an equality match conditions a
. For information on compound text
indexes, see Compound Text Indexes.
Supported Languages and Stop Words¶
MongoDB supports text search for various languages. text
indexes
drop language-specific stop words (e.g. in English, “the”, “an”, “a”,
“and”, etc.) and uses simple language-specific suffix stemming. For a
list of the supported languages, see Text Search Languages.
If you specify a language value of "none"
, then the text
index
uses simple tokenization with no list of stop words and no stemming.
For the Latin alphabet, text
indexes are case insensitive for
non-diacritics; i.e. case insensitive for [A-z]
. For all other
characters, text indexes treat them as distinct.
To specify a language for the text
index, see
Specify a Language for Text Index.
sparse
Property¶
text
indexes are sparse by default and
ignores the sparse: true option. If a
document lacks a text
index field (or the field is null
or an
empty array), MongoDB does not add an entry for the document to the
text
index. For inserts, MongoDB inserts the document but does not
add to the text
index.
For a compound index that includes a text
index key along with keys
of other types, only the text
index field determine whether the
index references a document. The other keys do not determine whether
the index references the documents or not.
Restrictions¶
Text Index and Sort¶
Sort operations cannot obtain sort order from a text
index, even
from a compound text index; i.e. sort
operations cannot use the ordering in the text index.
Compound Index¶
A compound index can include a text
index key in combination with ascending/descending index keys. However,
these compound indexes have the following restrictions:
- A compound
text
index cannot include any other special index types, such as multi-key or geospatial index fields. - If the compound
text
index includes keys preceding thetext
index key, to perform a$text
search, the query predicate must include equality match conditions on the preceding keys.
See also Text Index and Sort for additional limitations.
For an example of a compound text index, see Limit the Number of Entries Scanned.
Drop a Text Index¶
To drop a text
index, pass the name of the index to the
db.collection.dropIndex()
method. To get the name of the
index, run the getIndexes()
method.
For information on the default naming scheme for text
indexes as
well as overriding the default name, see
Specify Name for text Index.
Storage Requirements and Performance Costs¶
text
indexes have the following storage requirements and
performance costs:
text
indexes change the space allocation method for all future record allocations in a collection tousePowerOf2Sizes
.text
indexes can be large. They contain one index entry for each unique post-stemmed word in each indexed field for each document inserted.- Building a
text
index is very similar to building a large multi-key index and will take longer than building a simple ordered (scalar) index on the same data. - When building a large
text
index on an existing collection, ensure that you have a sufficiently high limit on open file descriptors. See the recommended settings. text
indexes will impact insertion throughput because MongoDB must add an index entry for each unique post-stemmed word in each indexed field of each new source document.- Additionally,
text
indexes do not store phrases or information about the proximity of words in the documents. As a result, phrase queries will run much more effectively when the entire collection fits in RAM.
Text Search¶
Text search supports the search of string content in documents of a
collection. MongoDB provides the $text
operator to perform
text search in queries and in aggregation pipelines.
The text search process:
- tokenizes and stems the search term(s) during both the index creation and the text command execution.
- assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.
The $text
operator can search for words and phrases. The query
matches on the complete stemmed words. For example, if a document field
contains the word blueberry
, a search on the term blue
will not
match the document. However, a search on either blueberry
or
blueberries
will match.
For information and examples on various text search patterns, see the
$text
query operator. For examples of text search in
aggregation pipeline, see Text Search in the Aggregation Pipeline.