混合搜索是针对相同或相似查询条件的不同搜索方法或搜索查询的聚合。该技术利用算法对结果进行排序,并从不同的搜索方法中返回统一的结果。您可以使用 $rankFusion
执行混合搜索。
什么是倒数排名融合?
倒数排名融合是一种技术,通过执行以下操作,将不同搜索方法的结果合并为一个结果集:
计算结果中各个文档的倒数排名。
对于每个搜索结果中每个排名的文档,首先将该文档的排名 (
r
) 与一个常数60
相加,以平滑分数 (rank_constant
),然后将1
除以r
与rank_constant
之和,得出该文档在搜索结果中的倒数排名。您无法设置rank_constant
的值,默认值为60
。reciprocal_rank = 1 / ( r + rank_constant ) 对于每种搜索方法,应用不同的权重 (
w
) 以使该搜索方法占更大重要性。对于每个文档,加权倒数排名的计算方法是将权重乘以文档的倒数排名。weighted_reciprocal_rank = w x reciprocal_rank 将结果中文档的派生排名分数和加权分数相结合。
对于所有搜索结果中的每个文档,将计算出的倒数排名相加,获得文档的单个分数。
按结果中文档的综合得分对结果进行排序。
根据结果中的综合得分对结果中的文档进行排序,以获得结果中文档的单一综合排名列表。
关于不同的混合搜索应用场景
You can leverage MongoDB Vector Search to perform several types of hybrid search. Specifically, MongoDB Vector Search supports the following use cases:
Full-text and vector search in a single query: You can combine results from different search methods, such as a semantic and a full-text search. You can use the
$vectorSearch
for the semantic search and the$search
for the full-text search results and combine the results by using the reciprocal rank fusion technique. To learn more, see the Perform Hybrid Search with MongoDB Vector Search and MongoDB Search tutorial, which demonstrates how to perform a semantic search and full-text search against thesample_mflix.embedded_movies
namespace and retrieve combined ranked results by using reciprocal rank fusion.Alternatively, for a more granular hybrid search where the score matters in addition to the relative ordering of results, you can use the
$scoreFusion
pipeline stage. To learn more, see the Perform Hybrid Search with MongoDB Vector Search and MongoDB Search tutorial, which demonstrates how to perform a semantic search and full-text search against thesample_mflix.embedded_movies
namespace and retrieve input pipeline results into a final scored results set.While
$rankFusion
ranks documents based on their positions (relative ranks) in input pipelines using the Reciprocal Rank Fusion algorithm,$scoreFusion
ranks documents based on scores assigned by the input pipelines, using mathematical expressions for combining the results.In
$rankFusion
, rankings are influenced by pipeline weights. In$scoreFusion
, weights control the contribution of each pipeline's scores to the final result.在单个 MongoDB
$rankFusion
管道中进行多个向量搜索查询:MongoDB 管道支持多个子管道,这些子管道包含针对同一集合执行的向量搜索查询,并使用倒数排名融合技术合并结果。如何组合多个$vectorSearch
查询教程演示了以下类型的向量搜索:对您的数据集进行全面搜索,以在同一查询中查找语义相似的术语。
在数据集中搜索多个字段,以确定哪些字段为查询返回最佳结果。
使用不同嵌入模型生成的嵌入向量进行搜索,以确定不同模型在语义解释上的差异。
Considerations
When using the $rankFusion
or $scoreFusion
pipeline stage for hybrid search, consider the following.
互斥结果集
如果想捕捉一种搜索方法无法捕捉到的漏报,从单个子管道获得不相交的结果可能是可以接受的。当结果不相交时,大部分或所有结果可能会显示为从一个管道返回,而不是从另一个管道返回。但是,如果您希望所有子管道返回类似的结果,请尝试增加每个子管道的结果数量。
权重
我们建议基于每个查询动态权衡词法查询与向量查询的权重,而非对所有查询采用固定权重,从而提升每个查询结果的相关性。这样还能将资源分配给最需要的查询,从而提高计算资源的利用率。
多个管道
You can combine an arbitrary number of sub-pipelines together in the $rankFusion
or $scoreFusion
stage, but they must all execute against the same collection. You can't use the $rankFusion
or $scoreFusion
stage to search across collections. Use the $unionWith
stage with $vectorSearch
for cross-collection search.
非搜索管道
地理空间相关性
You can use the $geoNear
and the near operator inside $search
for a geographic location search within the $rankFusion
or $scoreFusion
stage. However, the $geoNear
and the near operator use different coordinate reference frames. Therefore, the result ordinals and scores might not be identical.
限制结果
我们建议为每个子管道返回的结果数量设定限制。
限制
The following limitations apply to hybrid search using $rankFusion
and $scoreFusion
:
$rankFusion
is only supported on MongoDB 8.0.14 (including latest Rapid Release).$rankFusion
and$scoreFusion
sub-pipelines can contain only the following stages:$rankFusion
and$scoreFusion
preserve a traceable link back to the original input document for each sub-pipeline. Therefore, it doesn't support the following:$project
阶段storedSource 字段
$rankFusion
and$scoreFusion
sub-pipelines run serially, not in parallel.$rankFusion
and$scoreFusion
don't support pagination.rankFusion
can be run on Views only on clusters running MongoDB 8.0 or higher. You can't runrankFusion
within a view definition or on a time series collection.
先决条件
要试用这些教程,必须具备以下条件:
An Atlas cluster with MongoDB version v8.0 or later.
sample_mflix 数据库已加载到 Atlas 集群。
mongosh
在您的 Atlas 集群上尝试这些查询。注意
您还可以通过使用 Atlas CLI 创建的本地 Atlas 部署来尝试这些混合搜索用例。要了解详情,请参阅创建本地 Atlas 部署。