Search in documents with multiple texts

Hi,

I am a little bit confused about the following topic:

I have a scenario where my documents have fields in multiple languages and the search language is now known. When I remember it correctly it can be implemented in Lucene by using the correct analyzer and ideally you use the same analyzer for search and indexing. For example I have used an analyzer in the past that select the proper language specific analyzer based on the field name.

In the docs I see that I have to define the search analyzer: https://docs.atlas.mongodb.com/reference/atlas-search/index-definitions/#static-and-dynamic-mappings

But how would this work with my scenario? It would always use the standard analyzer and therefore not do the same stemming and stop word elimination that is used for filtering.

In Azure cognitive search on the other side (also using Lucene) I do not define the analyzer when searching. It just uses the same analyzer that is used for filtering.

Hey there, it’s Elle from the Atlas Search product team. You can define a language analyzer per field or use multi within the index definition if a single field has multiple languages. Let me know if this answers your question!

1 Like

Yes, I understand that. But I do not understand how to use the search analyzer in these cases.

It is a stupid example, because it does not make sense, but something like this could happen:

Lets say you have a word that exists in multiple languages but has different stems:

  1. House (English) => stem: House
  2. House (German) => stem: Hous (it is actually Haus, but I have no real example right now).

Then we have two documents:

// DOC 1
{
  "id": 1,
  "de": "House" <-- Indexes as Hous
}

and

// DOC 2
{ 
  "id": 2,
  "en": "House"
}

If I search with the German search analyzer for “House” it would only return document 1 and for English analyzer it would only find document 2.

Using the same multi example linked above, you could also define the search analyzer like below:

  "mappings": {
    "dynamic": false,
    "fields": {
      "text": {
        "type": "string",
        "multi": {
          "english": {
            "type": "string",
            "analyzer": "lucene.english",
            "searchAnalyzer": "lucene.english",
          },
          "french": {
            "type": "string",
            "analyzer": "lucene.french",
            "searchAnalyzer": "lucene.french",
          }
        }
      }
    }
  }
}

Oh, stupid me. I have not seen that you can define the search analyzer on the field level as well.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.