Create a Text Index for a Collection Containing Multiple Languages on Self-Managed Deployments
On this page
You can create a text index to improve the performance of text search queries run on a collection containing documents or embedded documents with text in multiple languages.
If a collection contains documents or embedded documents that are in multiple
different languages, include a field named language
and specify the
language for those documents as the field value. To see the languages available
for text indexing, see Text Search Languages on Self-Managed Deployments.
Your insert operation should resemble this example to support text indexing for multiple languages:
db.<collection>.insertOne( { <field>: <value>, language: <language> } )
Before You Begin
Create a quotes
collection that contains multi-language documents
that include the language
field:
db.quotes.insertMany( { _id: 1, language: "portuguese", original: "A sorte protege os audazes.", translation: [ { language: "english", quote: "Fortune favors the bold." }, { language: "spanish", quote: "La suerte protege a los audaces." } ] }, { _id: 2, language: "spanish", original: "Nada hay más surrealista que la realidad.", translation: [ { language: "english", quote: "There is nothing more surreal than reality." }, { language: "french", quote: "Il n'y a rien de plus surréaliste que la réalité." } ] }, { _id: 3, original: "Is this a dagger which I see before me?", translation: { language: "spanish", quote: "Es este un puñal que veo delante de mí." } } )
Procedure
The following operation creates a text index on the original
and
translation.quote
fields:
db.quotes.createIndex({ original: "text", "translation.quote": "text", "default_language" : "fr" })
Note
English is the default language for indexes. If you do not specify the default_language, your query must specify the language with the $language parameter. For more information, refer to Specify the Default Language for a Text Index on Self-Managed Deployments.
Results
The resulting index supports text search queries for the documents and
embedded documents containing the original
and translation.quote
fields.
The text index follows different suffix stemming rules, and ignores stop words
specific to each language, based on the value in the language
field.
For example, the following query searches for the french
word
réalité
.
db.quotes.find( { $text: { $search: "réalité" } } )
Output:
[ { _id: 2, language: 'spanish', original: 'Nada hay más surrealista que la realidad.', translation: [ { language: 'english', quote: 'There is nothing more surreal than reality.' }, { language: 'french', quote: "Il n'y a rien de plus surréaliste que la réalité." } ] } ]
For embedded documents that do not contain the language
field,
If the enclosing document contains the
language
field, then the index uses the document's language for the embedded documents.Otherwise, the index uses the default language for the embedded documents.
For documents that do not contain the language
field, the index uses the
default language, which is English.
Learn More
To specify the text index language in a field other than
language
, see Use Any Field to Specify Text Index Language on Self-Managed Deployments.To learn how to specify the default language for a text index, see Specify the Default Language for a Text Index on Self-Managed Deployments.
To learn about other text index properties, see Text Index Properties on Self-Managed Deployments.