Join us Sept 17 at .local NYC! Use code WEB50 to save 50% on tickets. Learn more >
MongoDB Event
Docs 菜单
Docs 主页
/
数据库手册
/ / /

$scoreFusion (aggregation)

重要

$scoreFusion is only available for deployments that use MongoDB 8.2+.

$scoreFusion

$scoreFusion first executes all input pipelines independently and then de-duplicates and combines the input pipeline results into a final scored results set.

$scoreFusion outputs a ranked set of documents based on the scores of the documents and weights from their input pipelines. You can specify an arithmetic expression to compute the score based on the input scores from the pipeline stage. By default, it uses the average of the scores for the documents from the different input pipeline stages.

Use $scoreFusion to search for documents in a single collection based on multiple criteria and retrieve a final scored result set that factors in all specified criteria.

该阶段采用以下语法:

{ $scoreFusion: {
input: {
pipelines: {
<input-pipeline-name>: <expression>,
<input-pipeline-name>: <expression>,
...
},
normalization: "none|sigmoid|minMaxScaler"
},
combination: {
weights: {
<input-pipeline-name>: <numeric expression>,
<input-pipeline-name>: <numeric expression>,
...
},
method: "avg|expression",
expression: <expression>
}
} }

$scoreFusion 采用以下字段:

字段
类型
说明

input

对象

Defines the input that $scoreFusion combines.

input.pipelines

对象

Contains a map of pipeline names to the aggregation stages that define that pipeline. input.pipelines must contain at least one pipeline. You must specify $score to the input pipeline if the input pipeline doesn't return a score. All pipelines must operate on the same collection and must have a unique name.

有关输入管道限制的更多信息,请参阅输入管道和输入管道名称。

input.normalization

字符串

Normalizes the score to the range 0 to 1 before combining the results. Value can be:

  • none - to not normalize.

  • sigmoid - to apply $sigmoid expression.

  • minMaxScaler - to apply the $minMaxScaler window operator.

combination

对象

可选。定义如何合并 input管道结果。

combination.weights

对象

Optional. Weights to apply to the normalized input pipeline scores when combining the results. Corresponds to the input pipelines, one per pipeline. The default weight is 1 if any pipeline's weight is unspecified. Each weight value must be a non-negative number (whole or decimal). Weight can be 0.

combination.method

字符串

Optional. Specifies method for combining scores. Value can be:

  • avg - to calculate the average of the input scores.

  • expression - to apply a custom aggregation expression that you specify in the combination.expression field.

如果省略,则默认值为 avg

combination.expression

Arithmetic Expression

Optional. Specifies the logic for combining the input scores. This is the custom expression that is used when combination.method is set to expression. Within the expression, use the name of the input pipeline to represent the corresponding input score for a document.

Mutually exclusive with combination.weights.

scoreDetails

布尔

Optional. Specifies whether to include detailed scoring information from each input pipeline in the output document's metadata. If omitted, default to false.

您只能将 $scoreFusion 与单个集合一起使用。您不能在数据库范围内使用此聚合阶段。

$scoreFusion 对最终输出中多个输入管道的结果去重。每个唯一的输入文档在 $scoreFusion 输出中最多出现一次,无论该文档在输入管道输出中出现多少次。

Each input pipeline must be both a Selection Pipeline and a Scoring Pipeline.

选择管道从集合中检索一设立文档,并且在检索后不执行任何修改。 $scoreFusion 比较不同输入管道中的文档,这要求所有输入管道输出相同的未经修改的文档。

选择管道必须仅包含以下阶段:

类型
阶段(Stages)

搜索阶段

  • $match, including $match with legacy text search $geoNear

  • $search

  • $vectorSearch

    注意

    如果在选择管道中使用 $geoNear,则无法指定 includeLogsdistanceField,因为这些字段会修改文档。

对阶段进行排序

分页阶段

A scoring pipeline sorts or orders documents based on the score of the documents. $scoreFusion uses the order of scored pipeline results to influence the output scores. Scoring pipelines must meet one of the following criteria:

input 中的管道名称必须符合以下限制:

  • 不得为空字符串

  • 不得以 $

  • 不得在字符串中的任何位置包含 ASCII 空字符分隔符 \0

  • 不得包含 .

如果将 scoreDetails设立为 true$scoreFusion 则会为每个文档创建一个 scoreDetails 元元数据字段。 scoreDetails字段包含有关最终排名的信息。

注意

When you set scoreDetails to true, $scoreFusion sets the scoreDetails metadata field for each document. By default, it doesn't automatically output the scoreDetails metafield.

To view the scoreDetails metadata field, you must explicitly set it through the $meta expression in a stage like $project, $addFields, or $set.

scoreDetails字段包含以下子字段:

字段
说明

value

The numerical value of the score for this document.

description

A description of how $scoreFusion computed the final score.

normalization

The normalization method used to normalize the score.

combination

The combination method and expression used to combine the pipeline results.

details

一个大量,其中每个大量条目都包含有关输出此文档的输入管道的信息。

details字段中的每个大量条目都包含以下子字段:

字段
说明

inputPipelineName

输出此文档的输入管道的名称。

inputPipelineRawScore

The score of the document from the pipeline before normalization.

weight

输入管道的权重。

value

可选。如果输入管道输出此文档的 { $meta: 'score' },则 value 包含 { $meta: 'score' }

details

输入管道的 scoreDetails字段。如果输入管道未输出 scoreDetails字段,则该字段为空大量。

警告

MongoDB不保证 scoreDetails 的任何特定输出格式。

例子

The following code blocks show the scoreDetails field for a $scoreFusion operation with $search, $vectorSearch, and $match input pipelines:

scoreDetails: {
value: 7.847857250621068,
description: 'the value calculated by combining the scores (either normalized or raw) across input pipelines from which this document is output from:',
normalization: 'sigmoid',
combination: {
method: 'custom expression',
expression: "{ string: { $sum: [ { $multiply: [ '$$searchOne', 10 ] }, '$$searchTwo' ] } }"
},
details: [
{
inputPipelineName: 'searchOne',
inputPipelineRawScore: 0.7987099885940552,
weight: 1,
value: 0.6896984675751023,
details: []
},
{
inputPipelineName: 'searchTwo',
inputPipelineRawScore: 2.9629626274108887,
weight: 1,
value: 0.950872574870045,
details: []
}
]
}

MongoDB将$scoreFusion 操作转换为一设立现有的聚合阶段,这些阶段在查询执行之前结合起来计算输出结果。 $scoreFusion操作的“解释结果”显示了$scoreFusion 用于构成最终结果的根本的聚合阶段的完整执行情况。

此示例使用具有嵌入和文本字段的集合。在集合上创建 searchvectorSearch 类型索引。

以下索引定义自动为集合中的所有动态可索引字段编制索引,以便对索引字段运行$search 查询。

搜索索引
db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
{
mappings: { dynamic: true }
}
)

以下索引定义使用集合中的嵌入对该字段进行索引,以便对该字段运行$vectorSearch 查询。

vectorSearch 索引
db.embedded_movies.createSearchIndex(
"<INDEX_NAME>",
"vectorSearch",
{
"fields": [
{
"type": "vector",
"path": "<FIELD_NAME>",
"numDimensions": <NUMBER_OF_DIMENSIONS>,
"similarity": "dotProduct"
}
]
}
);

以下聚合管道将 $scoreFusion 与以下输入管道结合使用:

管道
返回的文档数量
说明

searchOne

20

在索引为 vector 类型的字段上,针对指定为嵌入的术语运行向量搜索。该查询最多考虑 500 个最近邻,但将结果限制为 20 个文档。

searchTwo

20

对同一术语运行全文搜索,并将结果限制为 20 个文档。

1db.embedded_movies.aggregate( [
2 {
3 $scoreFusion: {
4 input: {
5 pipelines: {
6 searchOne: [
7 {
8 "$vectorSearch": {
9 "index": "<INDEX_NAME>",
10 "path": "<FIELD_NAME>",
11 "queryVector": <QUERY_EMBEDDINGS>,
12 "numCandidates": <NUMBER_OF_NEAREST_NEIGHBORS_TO_CONSIDER>,
13 "limit": <NUBMER_OF_DOCUMENTS_TO_RETURN>
14 }
15 }
16 ],
17 searchTwo: [
18 {
19 "$search": {
20 "index": "<INDEX_NAME>",
21 "text": {
22 "query": "<QUERY_TERM>",
23 "path": "<FIELD_NAME>"
24 }
25 }
26 },
27 ]
28 },
29 normalization: "sigmoid"
30 },
31 combination: {
32 method: "expression",
33 expression: {
34 $sum: [
35 {$multiply: [ "$$searchOne", 10]}, "$$searchTwo"
36 ]
37 }
38 },
39 "scoreDetails": true
40 }
41 },
42 {
43 "$project": {
44 _id: 1,
45 title: 1,
46 plot: 1,
47 scoreDetails: {"$meta": "scoreDetails"}
48 }
49 },
50 { $limit: 20 }
51] )

This pipeline performs the following actions:

  • 执行 input 管道

  • 合并返回的结果

  • 输出前 20 个文档,即 $scoreFusion管道中排名前 20 的结果

后退

$score

在此页面上