/ /

언어 분석기

언어별 분석기를 사용하여 특정 언어에 맞는 인덱스를 생성합니다. 각 언어 분석기에는 해당 언어의 사용 패턴에 따라 중지 단어와 단어 구분이 내장되어 있습니다.

MongoDB Search는 다음과 같은 언어 분석기를 제공합니다.

`lucene.arabic`	`lucene.armenian`	`lucene.basque`	`lucene.bengali`
`lucene.brazilian`	`lucene.bulgarian`	`lucene.catalan`	`lucene.chinese`
`lucene.cjk` ¹	`lucene.czech`	`lucene.danish`	`lucene.dutch`
`lucene.english`	`lucene.finnish`	`lucene.french`	`lucene.galician`
`lucene.german`	`lucene.greek`	`lucene.hindi`	`lucene.hungarian`
`lucene.indonesian`	`lucene.irish`	`lucene.italian`	`lucene.japanese`
`lucene.korean`	`lucene.kuromoji` ²	`lucene.latvian`	`lucene.lithuanian`
`lucene.morfologik` ³	`lucene.nori` ⁴	`lucene.norwegian`	`lucene.persian`
`lucene.polish`	`lucene.portuguese`	`lucene.romanian`	`lucene.russian`
`lucene.smartcn` ⁵	`lucene.sorani`	`lucene.spanish`	`lucene.swedish`
`lucene.thai`	`lucene.turkish`	`lucene.ukrainian`

¹ cjk 중국어, 일본어 및 한국어 분석기입니다.

² kuromoji 은(는) 일본어 분석기입니다.

³ morfologik 은(는) 폴란드어 분석기입니다.

⁴ nori 는 한국어 분석기입니다.

⁵ smartcn 는 중국어 분석기입니다.

예시

다음 문서가 포함된 cars (이)라는 이름의 컬렉션이 있다고 가정해 보겠습니다.

{
  "_id": 1,
  "subject": {
    "en": "It is better to equip our cars to understand the causes of the accident.",
    "fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.",
    "he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה."
  }
}

{
  "_id": 2,
  "subject": {
    "en": "The best time to do this is immediately after you've filled up with fuel",
    "fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.",
    "he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק."
  }
}

내장 언어 분석기 예제

다음 예제 인덱스 정의는 french 분석기를 사용하여 subject.fr 필드에 대한 인덱스를 지정합니다.

{
  "mappings": {
    "fields": {
      "subject": {
        "fields": {
          "fr": {
            "analyzer": "lucene.french",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  }
}

다음 MongoDB Search 쿼리는 subject.fr 필드에서 pour 문자열을 검색합니다. 이 쿼리를 실행하려면 mongosh를 사용하여 클러스터에 연결하고 cars 컬렉션이 포함된 데이터베이스로 전환합니다.

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "pour",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

french 분석기를 사용할 때 이전 쿼리는 결과를 반환하지 않는데, 이는 pour 가 내장된 중지 단어이기 때문입니다. standard 분석기를 사용하면 동일한 쿼리로 두 문서가 모두 반환됩니다.

다음 MongoDB Search 쿼리는 subject.fr 필드에서 carburant 문자열을 검색합니다. 이 쿼리를 실행하려면 mongosh를 사용하여 클러스터에 연결하고 cars 컬렉션이 포함된 데이터베이스로 전환합니다.

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "carburant",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }

쿼리 lucene.french 분석기 문서 에 대해 생성한 토큰과 일치하기 때문에 MongoDB Search는 결과에 _id: 1 이 포함된 문서 반환합니다. lucene.french 분석기 _id: 1가 있는 문서 의 subject.fr 필드 에 대해 다음 토큰을 생성합니다.

`meileu`	`moment`	`fair`
`est`	`imediat`	`aprè`
`fait`	`plein`	`carburant`

사용자 지정 언어 분석기 예시

icuFolding 및 stopword 토큰 필터를 사용하여 사용자 지정 분석기 를 만들어 지원되지 않는 언어에 대한 인덱스를 만들 수도 있습니다.

다음 예시 인덱스 정의는 히브리어 텍스트를 분석하고 토큰을 생성하기 위해 myHebrewAnalyzer라는 사용자 지정 분석기를 사용하여 subject.he 필드에 인덱스를 지정합니다.

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "subject": {
        "fields": {
          "he": {
            "analyzer": "myHebrewAnalyzer",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "myHebrewAnalyzer",
      "tokenFilters": [
        {
          "type": "icuFolding"
        },
        {
          "tokens": [
            "אן",
            "שלנו",
            "זה",
            "אל"
          ],
          "type": "stopword"
        }
      ],
      "tokenizer": {
        "type": "standard"
      }
    }
  ]
}

다음 MongoDB Search 쿼리는 subject.he 필드에서 המכוניות 문자열을 검색합니다. 이 쿼리를 실행하려면 mongosh를 사용하여 클러스터에 연결하고 cars 컬렉션이 포함된 데이터베이스로 전환합니다.

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "המכוניות",
        "path": "subject.he"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.he": 1
    }
  }
])

{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }

쿼리 myHebrewAnalyzer 분석기 문서 에 대해 생성한 토큰과 일치하기 때문에 MongoDB Search는 결과에 _id: 1 이 포함된 문서 반환합니다. myHebrewAnalyzer 분석기 _id: 1가 있는 문서 의 subject.he 필드 에 대해 다음 토큰을 생성합니다.

`עדיף`	`לצייד`	`את`
`המכוניות`	`כדי`	`להבין`
`את`	`הגורמים`	`לתאונה`

다국어 검색 예시

여러 언어 분석기를 사용하여 다국어 검색을 수행하는 인덱스를 생성할 수도 있습니다.

다음 예시 인덱스 정의는 sample_mflix.movies 컬렉션 에 동적 매핑을 사용하여 인덱스 지정합니다. 이 정의는 lucene.italian 언어 분석기 적용하여 fullplot 필드 인덱스 하고 다중 옵션을 사용하여 lucene.english 를 대체 언어 분석기 로 지정합니다. MongoDB Search는 movies 컬렉션 에서 동적으로 인덱싱하는 다른 모든 필드에 대해 기본값 lucene.english 언어 분석기 사용합니다.

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": true,
    "fields": {
      "fullplot": {
        "type": "string",
        "analyzer": "lucene.italian",
        "multi": {
          "fullplot_english": {
            "type": "string",
            "analyzer": "lucene.english",
          }
        }
      }
   }
  }
}

다음 MongoDB Search 쿼리는 compound 연산자를 사용하여 컬렉션을 여러 언어로 쿼리합니다. 이 쿼리를 실행하려면 mongosh를 사용하여 클러스터에 연결하고 sample_mflix 데이터베이스로 전환합니다.

compound 연산자에는 다음 절이 포함됩니다.

must 절은 Bella라는 텀을 포함하는 영어 및 이탈리아어 영화 줄거리를 텍스트 연산자를 사용하여 검색합니다.
mustNot 절은 range 연산자를 사용하여 1984년에서 2016년 사이에 개봉된 영화를 제외합니다.
should 절은 Comedy 장르에 대한 선호도를 text 연산자를 사용하여 지정합니다.

db.movies.aggregate([
  {
    $search: {
      "index": "multilingual-tutorial",
      "compound": {
        "must": [{
          "text": {
            "query": "Bella",
            "path": { "value": "fullplot", "multi": "fullplot_english" }
          }
        }],
        "mustNot": [{
          "range": {
            "path": "released",
            "gt": ISODate("1984-01-01T00:00:00.000Z"),
            "lt": ISODate("2016-01-01T00:00:00.000Z")
          }
        }],
        "should": [{
          "text": {
            "query": "Comedy",
            "path": "genres"
          }
        }]
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "title": 1,
      "plot": 1,
      "genres": 1,
      "runtime": 1,
      "fullplot": 1,
      "released": 1,
      "score": { "$meta": "searchScore" }
    }
  }
])

[
  {
    plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come ...",
    genres: [ 'Comedy' ],
    runtime: 100,
    title: 'Policewoman',
    fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come Giovanna d'Arco, il suo idolo, non guardera' in faccia a nessuno e con l'aiuto del pretore Patane', innamorato di lei, smascherera' una serie di intrallazzi e corruzione denunciando perfino il suo capo, Marcellini. I due paladini della giustizia coroneranno il loro sogno d'amore, trasferiti in una lontana isoletta a sud della Sicilia, ma i corrotti resteranno al loro posto.",
    released: ISODate('1974-11-15T00:00:00.000Z'),
    score: 3.4109344482421875
  },
  {
    plot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la ...`,
    genres: [ 'Comedy' ],
    runtime: 95,
    title: 'Love and Larceny',
    fullplot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la prigione, dove incontra Chinotto e Gloria Patri. Uscito inizia, con l'opposizione di Annalisa che lo vuole sposare, una carriera come truffatore, dapprima in societè con Chinotto e quindi con la bella Elena. Tutto sembra filare a gonfie vele, e le truffe divengono sempre piè grosse e di successo. Ma a volte è destino che il ragno resti preso dalla stessa tela che tesse.`,
    released: ISODate('1960-02-10T00:00:00.000Z'),
    score: 3.3489856719970703
  },
  {
    plot: 'He is a revenge-obssessed stevedore... She is a wealthy, elusive woman. They try hard to get together... or do they?',
    genres: [ 'Drama' ],
    runtime: 137,
    title: 'The Moon in the Gutter',
    fullplot: "Nightly, Gerard broods in an alley hoping to catch his sister's attacker. He lives with his lover Bella whom he neglects, an alcoholic brother who lurks about, and his father who's stayed drunk since the daughter's death, ignoring work and his own companion. At a seedy bar, Gerard meets a wealthy, nihilistic hedonist and his beautiful sister. Gerard flips for her and thinks she's his ticket out of the slum...",
    released: ISODate('1983-05-18T00:00:00.000Z'),
    score: 3.2985665798187256
  },
  {
    plot: 'Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories...',
    genres: [ 'Horror' ],
    runtime: 90,
    title: 'Tales That Witness Madness',
    fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories as to why each of the four patients went mad... cue four distinct tales each with a different set of characters: 'Mr Tiger' tells of Paul, the sensitive and troubled young son of prosperous but constantly bickering and unlovely parents, and the boy's 'imaginary' friend, a tiger. 'Penny Farthing' tells of Timothy, an antique store owner propelled backwards in time by a penny-farthing bicycle in his shop, all the while being watched over by the constantly changing photograph of Uncle Albert, which endangers the lives of both Timothy and his beautiful wife, Ann. 'Mel' tells of Brian, a man who brings home an old dead tree and prominently displays it in his living room as a work of art. His fiery wife Bella soon becomes jealous of the tree, which the husband has lovingly named Mel, and it seems to be developing a will of its own. 'Luau' tells of Auriol, a flamboyant and ambitious literary agent who will do anything to impress her sinister new client, though he seems more interested in Auriol's beautiful and precocious young daughter Ginny. Ginny sneaks off on holiday while Auriol plans a sumptuous feast for her client.",
    released: ISODate('1973-10-31T00:00:00.000Z'),
    score: 1.9504895210266113
  }
]

돌아가기

Keyword

멀티