Atlas Search synonyms collection language support?

I’m trying to define synonyms mapping on my MongoDB Cloud freetier cluster using Atlas Search and I got into this weird error. So, I was wondering if Atlas Search only supports english when defining synonyms in the collection.
I have some english and korean words as “equivalent” synonyms. I thought I messed up with the json configuration but whenver I enter korean word as an element of synonyms, the build failed.

{ “mappingType” : “equivalent” , “synonyms”: [“car”,“차”] } ← built w/o error

But the weird part is that, if I use korean word that has 1 letter it passed. So, basically this fails.

{ “mappingType” : “equivalent” , “synonyms”: [“car”,“차차”] } ← failed

car menas “차” in korean. (car === “차”). I just copied and pasted the same word twice to demonstrate the my condition. It looks like Atlas doesn’t support korean in synonyms collection but on the other hand, it works with only 1 letter word. Is it some sort of bug?

Hi @Polar_swimming - Welcome to the community :slight_smile:

But the weird part is that, if I use korean word that has 1 letter it passed. So, basically this fails.
{ “mappingType” : “equivalent” , “synonyms”: [“car”,“차차”] } ← failed

Can you provide the index details in JSON format and the full error message when you attempted to add the additional character?

I wasn’t able to reproduce any particular error when adding the dual character text "차차":

db.synonyms.find()
[
  {
    _id: ObjectId("634de9f26343f96ab5838209"),
    mappingType: 'equivalent',
    synonyms: [ 'car', 'vehicle', 'automobile', '차차' ]
  }
]

Running a search query using synonyms for the text query value "차차":

db.cars.aggregate([{$search:{text:{query:'차차',path:"name",synonyms:"mySynonyms"}}}])
[
  { _id: ObjectId("634de7486343f96ab5838200"), name: 'vehicle' },
  { _id: ObjectId("634de7416343f96ab58381fe"), name: 'car' },
  { _id: ObjectId("634de7446343f96ab58381ff"), name: 'car 2' }
]

I believe you should be able to get the error message on the index build failure from the UI by clicking the "View status details" message on the Search Index page.

Regards,
Jason

2 Likes

@Jason_Tran I saw you were replying to my question. Here’s what you asked for.
“name” field is document type. It has “fullName” and “tag” as its properties (fields).

{
  "name": {
    "fullName": "some full name",
    "tag": "some tag"
  }
}

Index configuration

{
  "analyzer": "lucene.nori",
  "searchAnalyzer": "lucene.nori",
  "mappings": {
    "fields": {
      "name": {
        "dynamic": true,
        "type": "document"
      }
    }
  },
  "synonyms": [
    {
      "analyzer": "lucene.nori",
      "name": "vendorSynonyms",
      "source": {
        "collection": "vendorSynonyms"
      }
    }
  ]
}

You know what. I was trying to show you some more examples but it seems some of them work just fine.
Here’s a list of documents from my synonyms collection.

#1

{
   "mappingType":"equivalent",
   "synonyms":[
      "samsung",
      "삼성",
      "삼성전자"
   ]
}

#2

{
   "mappingType":"equivalent",
   "synonyms":[
      "micron",
      "마이크론"
   ]
}

#3 (this causes error)

{
   "mappingType":"equivalent",
   "synonyms":[
      "asus",
      "아수스",
      "에이수스",
      "에이서스"
   ]
}

let me know if you need more info

Thank you for providing those details, i’ll do some testing on my system and update here accordingly.

Regards,
Jason

Hi @Polar_swimming,

#3 (this causes error)

{
   "mappingType":"equivalent",
   "synonyms":[
      "asus",
      "아수스",
      "에이수스",
      "에이서스"
   ]
}

I believe the specific entry on the synonym mapping you provided which is causing the index failure is "아수스". More specifically, after I had done some testing, it appears to be due to this character "아". The character is functioning as a stop word and as per the synonyms options documentation:

To use synonyms with stop words, you must either index the field using the Standard Analyzer or add the synonym entry without the stop word.

Regards,
Jason

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.