Docs Menu
Docs Home
/
MongoDB Manual
/ / / /

Create a Text Index for a Collection Containing Multiple Languages on Self-Managed Deployments

On this page

  • Before You Begin
  • Procedure
  • Results
  • Learn More

You can create a text index to improve the performance of text search queries run on a collection containing documents or embedded documents with text in multiple languages.

If a collection contains documents or embedded documents that are in multiple different languages, include a field named language and specify the language for those documents as the field value. To see the languages available for text indexing, see Text Search Languages on Self-Managed Deployments.

Your insert operation should resemble this example to support text indexing for multiple languages:

db.<collection>.insertOne(
{
<field>: <value>,
language: <language>
}
)

Create a quotes collection that contains multi-language documents that include the language field:

db.quotes.insertMany(
{
_id: 1,
language: "portuguese",
original: "A sorte protege os audazes.",
translation:
[
{
language: "english",
quote: "Fortune favors the bold."
},
{
language: "spanish",
quote: "La suerte protege a los audaces."
}
]
},
{
_id: 2,
language: "spanish",
original: "Nada hay más surrealista que la realidad.",
translation:
[
{
language: "english",
quote: "There is nothing more surreal than reality."
},
{
language: "french",
quote: "Il n'y a rien de plus surréaliste que la réalité."
}
]
},
{
_id: 3,
original: "Is this a dagger which I see before me?",
translation:
{
language: "spanish",
quote: "Es este un puñal que veo delante de mí."
}
}
)

The following operation creates a text index on the original and translation.quote fields:

db.quotes.createIndex({ original: "text", "translation.quote": "text", "default_language" : "fr" })

Note

English is the default language for indexes. If you do not specify the default_language, your query must specify the language with the $language parameter. For more information, refer to Specify the Default Language for a Text Index on Self-Managed Deployments.

The resulting index supports text search queries for the documents and embedded documents containing the original and translation.quote fields. The text index follows different suffix stemming rules, and ignores stop words specific to each language, based on the value in the language field.

For example, the following query searches for the french word réalité.

db.quotes.find(
{ $text:
{ $search: "réalité" }
}
)

Output:

[
{
_id: 2,
language: 'spanish',
original: 'Nada hay más surrealista que la realidad.',
translation: [
{
language: 'english',
quote: 'There is nothing more surreal than reality.'
},
{
language: 'french',
quote: "Il n'y a rien de plus surréaliste que la réalité."
}
]
}
]

For embedded documents that do not contain the language field,

  • If the enclosing document contains the language field, then the index uses the document's language for the embedded documents.

  • Otherwise, the index uses the default language for the embedded documents.

For documents that do not contain the language field, the index uses the default language, which is English.

Back

Specify Language

Next

Field Use