/ /

如何为自动完成的字段创建索引

部署类型

接口

您可以使用MongoDB Search autocomplete 类型对字符串字段中的文本值索引以支持自动完成。您可以使用autocomplete操作符查询索引为 autocomplete 类型的字段。

您还可以使用 autocomplete 类型创建索引：

值为字符串数组的字段。要了解更多信息，请参阅如何对数组元素进行索引。
String fields inside an array of documents indexed as the embeddedDocuments type. For index build time considerations, see Index Build Time.

For dynamic mapping considerations, see Dynamic Mappings.

Define the Index for the `autocomplete` Type

Configure `autocomplete` Field Properties

MongoDB Search autocomplete 类型采用以下参数：

选项

类型

必要性

说明

默认

type

字符串

必需

标识此字段类型的人类可读标签。值必须是 string。

analyzer

字符串

可选

用于此自动完成映射的分析器名称。您可以使用任何MongoDB Search分析器，但lucene.kuromoji语言分析器和以下自定义分析器分词器和词元筛选器除外：

nGram 分词器
edgeGram 分词器
daitchMokotoffSoundex 令牌筛选器
nGram 词元筛选器
edgeGram 词元筛选器
shingle 词元过滤器

lucene.standard

maxGrams

int

可选

每个索引序列的最大字符数。该值限制索引词元的字符长度。当您搜索比 maxGrams 值长的词时， MongoDB Search 会将词元截断为 maxGrams 长度。

For maxGrams best practices, see maxGrams Configuration.

15

minGrams

int

可选

每个索引序列的最小字符数。我们建议将 4 作为最小值。小于 4 的值可能会影响性能，因为索引可能会变得非常大。我们建议仅对 edgeGram 使用默认值 2。

2

tokenization

枚举

可选

对字段进行索引以支持自动完成时使用的分词策略。值可以是以下值之一：

edgeGram — 通过从词语左侧开始的可变长度字符序列创建可索引词元（称为 grams），该序列的边界是由用于该自动完成映射的分析器定义的。
rightEdgeGram — 通过从词语右侧开始的可变长度字符序列创建可索引词元（称为 grams），该序列的边界是由用于该自动完成映射的分析器定义的。
nGram — 通过在单词上滑动可变长度字符窗口来创建可索引词元（称为 grams）。MongoDB Search 为 nGram 创建的词元数超过 edgeGram 或 rightEdgeGram。因此，nGram 需要更多的空间和时间来索引字段。nGram 更适合查询具有较长复合词或不使用空格的语言。

edgeGram、rightEdgeGram 和 nGram是在字母级别应用的。例如，请考虑以下句子：

The quick brown fox jumps over the lazy dog.

使用 minGrams 值 2 和 maxGrams 值 5 进行分词时， MongoDB Search 会根据您选择的 tokenization 值对以下字符序列进行索引。

edgeGram

th
the
the{SPACE}
the q
qu
qui
quic
uick
...

rightEdgeGram

og
dog
{SPACE}dog
y dog
zy
azy
lazy
{SPACE}lazy
he
the
{SPACE}the
r the
er
ver
over
{SPACE}over
...

nGram

th
the
the{SPACE}
the q
he
he{SPACE}
he q
he qu
e{SPACE}
e q
e qu
e qui
{SPACE}q
{SPACE}qu
{SPACE}qui
{SPACE}quic
qu
qui
quic
quick
...

For performance considerations, see Tokenization Performance.

edgeGram

foldDiacritics

布尔

可选

指示是否执行规范化的标记，例如包含或删除索引文本中的变音符号。值可以是以下值之一：

true — 执行规范化，例如忽略索引和查询文本中的变音符号。示例，搜索cafè 会返回包含 cafè 和 cafe 字符的结果，因为MongoDB Search 会返回包含和不包含变音符号的结果。
false — 不执行规范化，例如忽略索引和查询文本中的变音符号。因此， MongoDB Search 仅返回与查询中带或不带变音符号的字符串匹配的结果。示例，搜索cafè 仅返回包含 cafè 字符的结果。搜索cafe 仅返回包含 cafe 字符的结果。

true

similarity.type

字符串

可选

在使用 autocomplete 操作符进行评分时，所用的字符串映射的相似度算法名称。值可以是以下之一：bm25、boolean 或 stableTfl。

要学习；了解有关可用相似度算法的更多信息，请参阅分数详细信息。

bm25

Try an Example for the `autocomplete` Type

Considerations

`maxGrams` 配置

The maxGrams option specifies the maximum length of substrings generated during indexing. Increasing maxGrams improves matching for longer queries by generating more substrings. Setting it beyond what you need can increase index size and affect indexing performance.

Consider the following best practices when you configure maxGrams:

Default to no more than 15. Set maxGrams to no more than 15 when possible to avoid unnecessary index growth.
Align with query length. Set maxGrams based on the typical length of user queries, rather than indexing for worst-case scenarios.
Avoid over-indexing. If your queries are shorter than your current maxGrams value, you may be indexing more data than necessary.
Use an alternative for longer queries. If your queries regularly exceed 15 characters, use a custom analyzer for prefix, contains, and suffix patterns.

Tokenization Performance

Indexing a field for autocomplete with an edgeGram, rightEdgeGram, or nGram tokenization strategy requires more computation and index storage than indexing a string field.

For the specified tokenization strategy, MongoDB Search concatenates sequential tokens before emitting them ("shingling"). MongoDB Search emits tokens between minGrams and maxGrams characters in length:

保留小于minGrams的词元。
Joins tokens greater than minGrams but less than maxGrams to the next tokens to create tokens up to the specified maximum number of characters in length.

动态映射

The default field types that MongoDB Search uses for dynamic mappings do not include the autocomplete type. Using the autocomplete type in dynamic mappings can increase index size and resource usage, and produce unexpected scoring results. Use autocomplete in static mappings.

However, if you need to include autocomplete in dynamic mappings, you can add it to a custom typeSet definition. To learn more about autocomplete and custom typeSet configurations, see Index Size and Configuration.

Index Build Time

If your dataset has many documents or a wide data range, building this index for the autocomplete operator can take some time. To reduce the impact on other indexes and queries while the new index builds, create a separate index with only the autocomplete type.

For index performance considerations, see Index Performance Considerations.

了解详情

如要了解有关 autocomplete 操作符的更多信息并查看查询示例，请参阅自动完成。

有关演示如何使用正则表达式运行不区分大小写、前缀、开头为和包含查询的示例，请参阅使用 MongoDB Search 而不是正则表达式查询。

后退

阵列

来年

布尔