Docs Menu
Docs Home
/
MongoDB Manual
/ / / /

Use Any Field to Specify Text Index Language on Self-Managed Deployments

On this page

  • Before You Begin
  • Procedure
  • Results
  • Search for a Valid Term
  • Search for a Stop Word
  • Learn More

A text index's language determines the rules used to parse stem words and ignore stop words when you run text search queries.

By default, if a text index does not have a default language, the index uses the language document field to determine the language it uses. As a result, text indexes are not limited to a single language because the value of the language field can change between documents.

You can change the field that the index uses to determine its language. This is useful if your field names are not in English, and your documents do not have a field called language.

To specify the text index language in a field other than language, include the language_override option when you create the index:

db.<collection>.createIndex(
{ <field> : "text" },
{ language_override: "<field>" }
)

The text index uses the field specified in the language_override option to determine the language to use for the corresponding document.

For documents that don't contain the field specified in language_override, the index uses English as its language.

Create the quotes collection:

db.quotes.insertMany(
[
{
_id: 1,
idioma: "portuguese",
quote: "A sorte protege os audazes"
},
{
_id: 2,
idioma: "spanish",
quote: "Nada hay más surrealista que la realidad."
},
{
_id: 3,
idioma: "english",
quote: "is this a dagger which I see before me"
}
]
)

The language for each quote is specified in the idioma field.

Create a text index on the quote field. Specify the language_override option to cause the text index to use the idioma field for the language:

db.quotes.createIndex(
{ quote : "text" },
{ language_override: "idioma" }
)

The index supports text search queries on the quote field and uses language rules based on the language specified in the idioma field. Each document specifies a different value in the idioma field, which means that each document is searched with different language rules.

Consider the following examples:

The following query searches for the string audazes:

db.quotes.find(
{
$text: { $search: "audazes" }
}
)

Output:

[
{ _id: 1, idioma: 'portuguese', quote: 'A sorte protege os audazes' }
]

The preceding query uses Portuguese as the language to fulfill the query.

The following query searches for the string hay:

db.quotes.find(
{
$text: { $search: "hay" }
}
)

The preceding query returns no results, even though the string hay appears in the quote field of document _id: 2.

Document _id: 2 specifies a language of Spanish. hay is considered a stop word in Spanish, and is therefore not incldued in the text index.

Back

Multiple Languages