Search in documents with multiple texts

Sebastian_Stehle · November 8, 2021, 3:21pm

Hi,

I am a little bit confused about the following topic:

I have a scenario where my documents have fields in multiple languages and the search language is now known. When I remember it correctly it can be implemented in Lucene by using the correct analyzer and ideally you use the same analyzer for search and indexing. For example I have used an analyzer in the past that select the proper language specific analyzer based on the field name.

In the docs I see that I have to define the search analyzer: https://docs.atlas.mongodb.com/reference/atlas-search/index-definitions/#static-and-dynamic-mappings

But how would this work with my scenario? It would always use the standard analyzer and therefore not do the same stemming and stop word elimination that is used for filtering.

In Azure cognitive search on the other side (also using Lucene) I do not define the analyzer when searching. It just uses the same analyzer that is used for filtering.

Elle_Shwer · November 8, 2021, 3:33pm

Hey there, it’s Elle from the Atlas Search product team. You can define a language analyzer per field or use multi within the index definition if a single field has multiple languages. Let me know if this answers your question!

Sebastian_Stehle · November 8, 2021, 3:51pm

Yes, I understand that. But I do not understand how to use the search analyzer in these cases.

Sebastian_Stehle · November 8, 2021, 3:56pm

It is a stupid example, because it does not make sense, but something like this could happen:

Lets say you have a word that exists in multiple languages but has different stems:

House (English) => stem: House
House (German) => stem: Hous (it is actually Haus, but I have no real example right now).

Then we have two documents:

// DOC 1
{
  "id": 1,
  "de": "House" <-- Indexes as Hous
}

and

// DOC 2
{ 
  "id": 2,
  "en": "House"
}

If I search with the German search analyzer for “House” it would only return document 1 and for English analyzer it would only find document 2.

Elle_Shwer · November 8, 2021, 4:45pm

Using the same multi example linked above, you could also define the search analyzer like below:

  "mappings": {
    "dynamic": false,
    "fields": {
      "text": {
        "type": "string",
        "multi": {
          "english": {
            "type": "string",
            "analyzer": "lucene.english",
            "searchAnalyzer": "lucene.english",
          },
          "french": {
            "type": "string",
            "analyzer": "lucene.french",
            "searchAnalyzer": "lucene.french",
          }
        }
      }
    }
  }
}

Sebastian_Stehle · November 8, 2021, 4:59pm

Oh, stupid me. I have not seen that you can define the search analyzer on the field level as well.

system · November 13, 2021, 5:00pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.