使用特定于语言的分析器创建适合特定语言的索引。 每种语言分析器都具有基于该语言使用模式的内置停用词和词划分。
MongoDB Search 提供以下语言分析器:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 cjk 是通用的中文、日文和韩文分析器
2 kuromoji 是日文分析器
3 morfologik 是一个波兰语分析器
4 nori 是韩语分析器
5 smartcn 是一个中文分析器
示例
以一个名为cars的集合为例,其中包含以下文档:
{ "_id": 1, "subject": { "en": "It is better to equip our cars to understand the causes of the accident.", "fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.", "he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה." } }
{ "_id": 2, "subject": { "en": "The best time to do this is immediately after you've filled up with fuel", "fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.", "he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק." } }
内置语言分析器示例
以下示例索引定义使用french分析器指定subject.fr字段上的索引:
{ "mappings": { "fields": { "subject": { "fields": { "fr": { "analyzer": "lucene.french", "type": "string" } }, "type": "document" } } } }
以下MongoDB Search查询在 字段中搜索字符串pour subject.fr。要运行此查询,请使用mongosh 连接到集群,并切换到包含cars 集合的数据库。
db.cars.aggregate([ { $search: { "text": { "query": "pour", "path": "subject.fr" } } }, { $project: { "_id": 0, "subject.fr": 1 } } ])
使用french分析器时,上一个查询不会返回任何结果,因为pour是内置停用词。 使用standard分析器,同一查询将返回两个文档。
以下MongoDB Search查询在 字段中搜索字符串carburant subject.fr。要运行此查询,请使用mongosh 连接到集群,并切换到包含cars 集合的数据库。
db.cars.aggregate([ { $search: { "text": { "query": "carburant", "path": "subject.fr" } } }, { $project: { "_id": 0, "subject.fr": 1 } } ])
{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }
MongoDB Search 在结果中返回包含 _id: 1 的文档,因为该查询与 lucene.french分析器为该文档创建的词元匹配。lucene.french分析器使用 _id: 1 为文档中的 subject.fr字段创建以下词元:
|
|
|
|
|
|
|
|
|
自定义语言分析器示例
您还可以使用 icuFolding 和 stopword 词元过滤器创建 自定义分析器,为不支持的语言创建索引。
以下示例索引定义使用名为 myHebrewAnalyzer 的自定义分析器在 subject.he 字段上指定索引,用于分析和创建适用于希伯来语文本的词元:
{ "analyzer": "lucene.standard", "mappings": { "dynamic": false, "fields": { "subject": { "fields": { "he": { "analyzer": "myHebrewAnalyzer", "type": "string" } }, "type": "document" } } }, "analyzers": [ { "charFilters": [], "name": "myHebrewAnalyzer", "tokenFilters": [ { "type": "icuFolding" }, { "tokens": [ "אן", "שלנו", "זה", "אל" ], "type": "stopword" } ], "tokenizer": { "type": "standard" } } ] }
以下MongoDB Search查询在 字段中搜索字符串המכוניות subject.he。要运行此查询,请使用mongosh 连接到集群,并切换到包含cars 集合的数据库。
db.cars.aggregate([ { $search: { "text": { "query": "המכוניות", "path": "subject.he" } } }, { $project: { "_id": 0, "subject.he": 1 } } ])
{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }
MongoDB Search 在结果中返回包含 _id: 1 的文档,因为该查询与 myHebrewAnalyzer分析器为文档创建的词元匹配。myHebrewAnalyzer分析器使用 _id: 1 为文档中的 subject.he字段创建以下词元:
|
|
|
|
|
|
|
|
|
多语言搜索示例
您还可以创建一个使用多种语言分析器的索引来执行多语言搜索。
以下示例索引定义在 sample_mflix.movies集合上指定具有动态映射的索引。该定义应用 lucene.italian语言分析器来索引fullplot字段,并使用 multi 选项指定 lucene.english 作为备用语言分析器。MongoDB Search 对其在 movies集合中动态索引的所有其他字段使用默认的lucene.english语言分析器。
{ "analyzer": "lucene.standard", "mappings": { "dynamic": true, "fields": { "fullplot": { "type": "string", "analyzer": "lucene.italian", "multi": { "fullplot_english": { "type": "string", "analyzer": "lucene.english", } } } } } }
以下MongoDB Search查询使用复合运算符查询多种语言的集合。要运行此查询,请使用 mongosh连接到集群并切换到sample_mflix 数据库。
must子句使用 文本操作符搜索包含术语Bella的英语和意大利语电影情节mustNot子句使用 范围操作符排除 1984 至 2016 年间上映的电影should子句使用 文本操作符指定Comedy类型的偏好
db.movies.aggregate([ { $search: { "index": "multilingual-tutorial", "compound": { "must": [{ "text": { "query": "Bella", "path": { "value": "fullplot", "multi": "fullplot_english" } } }], "mustNot": [{ "range": { "path": "released", "gt": ISODate("1984-01-01T00:00:00.000Z"), "lt": ISODate("2016-01-01T00:00:00.000Z") } }], "should": [{ "text": { "query": "Comedy", "path": "genres" } }] } } }, { $project: { "_id": 0, "title": 1, "plot": 1, "genres": 1, "runtime": 1, "fullplot": 1, "released": 1, "score": { "$meta": "searchScore" } } } ])
[ { plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come ...", genres: [ 'Comedy' ], runtime: 100, title: 'Policewoman', fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: tutti la vogliono solo usare, anche il suo fidanzata Claudio. Trovera' una via d'uscita diventando vigile urbano. Come Giovanna d'Arco, il suo idolo, non guardera' in faccia a nessuno e con l'aiuto del pretore Patane', innamorato di lei, smascherera' una serie di intrallazzi e corruzione denunciando perfino il suo capo, Marcellini. I due paladini della giustizia coroneranno il loro sogno d'amore, trasferiti in una lontana isoletta a sud della Sicilia, ma i corrotti resteranno al loro posto.", released: ISODate('1974-11-15T00:00:00.000Z'), score: 3.4109344482421875 }, { plot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la ...`, genres: [ 'Comedy' ], runtime: 95, title: 'Love and Larceny', fullplot: `Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è del suo parere. Cosè, per arrotondare gli introiti, aiuta l'amico Lallo in un suo "lavoretto". Questo gli costa perè la prigione, dove incontra Chinotto e Gloria Patri. Uscito inizia, con l'opposizione di Annalisa che lo vuole sposare, una carriera come truffatore, dapprima in societè con Chinotto e quindi con la bella Elena. Tutto sembra filare a gonfie vele, e le truffe divengono sempre piè grosse e di successo. Ma a volte è destino che il ragno resti preso dalla stessa tela che tesse.`, released: ISODate('1960-02-10T00:00:00.000Z'), score: 3.3489856719970703 }, { plot: 'He is a revenge-obssessed stevedore... She is a wealthy, elusive woman. They try hard to get together... or do they?', genres: [ 'Drama' ], runtime: 137, title: 'The Moon in the Gutter', fullplot: "Nightly, Gerard broods in an alley hoping to catch his sister's attacker. He lives with his lover Bella whom he neglects, an alcoholic brother who lurks about, and his father who's stayed drunk since the daughter's death, ignoring work and his own companion. At a seedy bar, Gerard meets a wealthy, nihilistic hedonist and his beautiful sister. Gerard flips for her and thinks she's his ticket out of the slum...", released: ISODate('1983-05-18T00:00:00.000Z'), score: 3.2985665798187256 }, { plot: 'Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories...', genres: [ 'Horror' ], runtime: 90, title: 'Tales That Witness Madness', fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum housing four very special cases. Visited by colleague Nicholas, Tremayne explains his amazing and controversial theories as to why each of the four patients went mad... cue four distinct tales each with a different set of characters: 'Mr Tiger' tells of Paul, the sensitive and troubled young son of prosperous but constantly bickering and unlovely parents, and the boy's 'imaginary' friend, a tiger. 'Penny Farthing' tells of Timothy, an antique store owner propelled backwards in time by a penny-farthing bicycle in his shop, all the while being watched over by the constantly changing photograph of Uncle Albert, which endangers the lives of both Timothy and his beautiful wife, Ann. 'Mel' tells of Brian, a man who brings home an old dead tree and prominently displays it in his living room as a work of art. His fiery wife Bella soon becomes jealous of the tree, which the husband has lovingly named Mel, and it seems to be developing a will of its own. 'Luau' tells of Auriol, a flamboyant and ambitious literary agent who will do anything to impress her sinister new client, though he seems more interested in Auriol's beautiful and precocious young daughter Ginny. Ginny sneaks off on holiday while Auriol plans a sumptuous feast for her client.", released: ISODate('1973-10-31T00:00:00.000Z'), score: 1.9504895210266113 } ]