重要
$scoreFusion
is only available for deployments that use MongoDB 8.2+.
定义
$scoreFusion
$scoreFusion
first executes all input pipelines independently and then de-duplicates and combines the input pipeline results into a final scored results set.$scoreFusion
outputs a ranked set of documents based on the scores of the documents and weights from their input pipelines. You can specify an arithmetic expression to compute the score based on the input scores from the pipeline stage. By default, it uses the average of the scores for the documents from the different input pipeline stages.Use
$scoreFusion
to search for documents in a single collection based on multiple criteria and retrieve a final scored result set that factors in all specified criteria.
语法
该阶段采用以下语法:
{ $scoreFusion: { input: { pipelines: { <input-pipeline-name>: <expression>, <input-pipeline-name>: <expression>, ... }, normalization: "none|sigmoid|minMaxScaler" }, combination: { weights: { <input-pipeline-name>: <numeric expression>, <input-pipeline-name>: <numeric expression>, ... }, method: "avg|expression", expression: <expression> } } }
字段
$scoreFusion
采用以下字段:
字段 | 类型 | 说明 |
---|---|---|
| 对象 | Defines the input that |
| 对象 | Contains a map of pipeline names to the aggregation stages that define that pipeline. |
| 字符串 | Normalizes the score to the range
|
| 对象 | 可选。定义如何合并 |
| 对象 | Optional. Weights to apply to the normalized input pipeline scores when combining the results. Corresponds to the input pipelines, one per pipeline. The default weight is |
| 字符串 | Optional. Specifies method for combining scores. Value can be:
如果省略,则默认值为 |
| Arithmetic Expression | Optional. Specifies the logic for combining the input scores. This is the custom expression that is used when Mutually exclusive with |
| 布尔 | Optional. Specifies whether to include detailed scoring information from each input pipeline in the output document's metadata. If omitted, default to |
行为
集合
您只能将 $scoreFusion
与单个集合一起使用。您不能在数据库范围内使用此聚合阶段。
De-Duplication
$scoreFusion
对最终输出中多个输入管道的结果去重。每个唯一的输入文档在 $scoreFusion
输出中最多出现一次,无论该文档在输入管道输出中出现多少次。
输入管道
Each input
pipeline must be both a Selection Pipeline and a Scoring Pipeline.
选择管道
选择管道从集合中检索一设立文档,并且在检索后不执行任何修改。 $scoreFusion
比较不同输入管道中的文档,这要求所有输入管道输出相同的未经修改的文档。
选择管道必须仅包含以下阶段:
类型 | 阶段(Stages) |
---|---|
搜索阶段 |
|
对阶段进行排序 | |
分页阶段 |
Scoring Pipeline
A scoring pipeline sorts or orders documents based on the score of the documents. $scoreFusion
uses the order of scored pipeline results to influence the output scores. Scoring pipelines must meet one of the following criteria:
输入管道名称
input
中的管道名称必须符合以下限制:
不得为空字符串
不得以
$
不得在字符串中的任何位置包含 ASCII 空字符分隔符
\0
不得包含
.
scoreDetails
如果将 scoreDetails
设立为 true
,$scoreFusion
则会为每个文档创建一个 scoreDetails
元元数据字段。 scoreDetails
字段包含有关最终排名的信息。
注意
When you set scoreDetails
to true
, $scoreFusion
sets the scoreDetails
metadata field for each document. By default, it doesn't automatically output the scoreDetails
metafield.
To view the scoreDetails
metadata field, you must explicitly set it through the $meta
expression in a stage like $project
, $addFields
, or $set
.
scoreDetails
字段包含以下子字段:
字段 | 说明 |
---|---|
| The numerical value of the score for this document. |
| A description of how |
| The normalization method used to normalize the score. |
| The combination method and expression used to combine the pipeline results. |
| 一个大量,其中每个大量条目都包含有关输出此文档的输入管道的信息。 |
details
字段中的每个大量条目都包含以下子字段:
字段 | 说明 |
---|---|
| 输出此文档的输入管道的名称。 |
| The score of the document from the pipeline before normalization. |
| 输入管道的权重。 |
| 可选。如果输入管道输出此文档的 |
| 输入管道的 |
警告
MongoDB不保证 scoreDetails
的任何特定输出格式。
例子
The following code blocks show the scoreDetails
field for a $scoreFusion
operation with $search
, $vectorSearch
, and $match
input pipelines:
scoreDetails: { value: 7.847857250621068, description: 'the value calculated by combining the scores (either normalized or raw) across input pipelines from which this document is output from:', normalization: 'sigmoid', combination: { method: 'custom expression', expression: "{ string: { $sum: [ { $multiply: [ '$$searchOne', 10 ] }, '$$searchTwo' ] } }" }, details: [ { inputPipelineName: 'searchOne', inputPipelineRawScore: 0.7987099885940552, weight: 1, value: 0.6896984675751023, details: [] }, { inputPipelineName: 'searchTwo', inputPipelineRawScore: 2.9629626274108887, weight: 1, value: 0.950872574870045, details: [] } ] }
解释结果
MongoDB将$scoreFusion
操作转换为一设立现有的聚合阶段,这些阶段在查询执行之前结合起来计算输出结果。 $scoreFusion
操作的“解释结果”显示了$scoreFusion
用于构成最终结果的根本的聚合阶段的完整执行情况。
示例
此示例使用具有嵌入和文本字段的集合。在集合上创建 search
和 vectorSearch
类型索引。
以下索引定义自动为集合中的所有动态可索引字段编制索引,以便对索引字段运行$search
查询。
db.embedded_movies.createSearchIndex( "<INDEX_NAME>", { mappings: { dynamic: true } } )
以下索引定义使用集合中的嵌入对该字段进行索引,以便对该字段运行$vectorSearch
查询。
db.embedded_movies.createSearchIndex( "<INDEX_NAME>", "vectorSearch", { "fields": [ { "type": "vector", "path": "<FIELD_NAME>", "numDimensions": <NUMBER_OF_DIMENSIONS>, "similarity": "dotProduct" } ] } );
以下聚合管道将 $scoreFusion
与以下输入管道结合使用:
管道 | 返回的文档数量 | 说明 |
---|---|---|
| 20 | 在索引为 |
| 20 | 对同一术语运行全文搜索,并将结果限制为 20 个文档。 |
1 db.embedded_movies.aggregate( [ 2 { 3 $scoreFusion: { 4 input: { 5 pipelines: { 6 searchOne: [ 7 { 8 "$vectorSearch": { 9 "index": "<INDEX_NAME>", 10 "path": "<FIELD_NAME>", 11 "queryVector": <QUERY_EMBEDDINGS>, 12 "numCandidates": <NUMBER_OF_NEAREST_NEIGHBORS_TO_CONSIDER>, 13 "limit": <NUBMER_OF_DOCUMENTS_TO_RETURN> 14 } 15 } 16 ], 17 searchTwo: [ 18 { 19 "$search": { 20 "index": "<INDEX_NAME>", 21 "text": { 22 "query": "<QUERY_TERM>", 23 "path": "<FIELD_NAME>" 24 } 25 } 26 }, 27 ] 28 }, 29 normalization: "sigmoid" 30 }, 31 combination: { 32 method: "expression", 33 expression: { 34 $sum: [ 35 {$multiply: [ "$$searchOne", 10]}, "$$searchTwo" 36 ] 37 } 38 }, 39 "scoreDetails": true 40 } 41 }, 42 { 43 "$project": { 44 _id: 1, 45 title: 1, 46 plot: 1, 47 scoreDetails: {"$meta": "scoreDetails"} 48 } 49 }, 50 { $limit: 20 } 51 ] )
This pipeline performs the following actions:
执行
input
管道合并返回的结果
输出前 20 个文档,即
$scoreFusion
管道中排名前 20 的结果