Creating Atlas search configuration to search latin accented text?

I am trying to set up an Atlas search index, to be able to search accented text.

This mean for the following text variations:

  • Téléphone
  • téléphone
  • telephone

The following search terms should return matching documents:

  • Télé
  • telephone
  • téléphone
  • tele

I am trying to set up a search index configuration, but it doesn’t provide me with the expected output when I test it in the Mongo web console:

{
  "analyzer": "diacriticFolder",
  "mappings": {
    "dynamic": true
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "diacriticFolder",
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "normalizationForm": "nfkc",
          "type": "icuNormalizer"
        }
      ],
      "tokenizer": {
        "type": "whitespace"
      }
    }
  ]
}

I will be using this from NodeJS, but the problems are seen when testing in the web page.

Following some more experimenting, this seems to work:

{
  "analyzer": "diacriticFolder",
  "mappings": {
    "dynamic": true
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "diacriticFolder",
      "tokenFilters": [
        {
          "type": "icuFolding"
        }
      ],
      "tokenizer": {
        "type": "standard"
      }
    }
  ]
}
1 Like