Docs 菜单
Docs 主页
/ /

使用MongoDB和 LangChain 执行混合搜索

您可以将MongoDB与 LangChain 集成以执行 混合搜索。在本教程中,您将完成以下步骤:

  1. 设置环境。

  2. 使用MongoDB作为向量存储。

  3. 对数据创建MongoDB Vector Search 和MongoDB Search索引。

  4. 运行混合搜索查询。

  5. 将查询结果传递到 RAG管道。

使用本教程的可运行版本以作为 Python 笔记本

如要完成本教程,您必须具备以下条件:

  • 以下MongoDB 集群类型之一:

  • Voyage AI API密钥。要创建帐户和API密钥,请参阅 Voyage AI网站。

  • OpenAI API密钥。您必须拥有一个具有可用于API请求的积分的 OpenAI 帐户。要学习;了解有关注册 OpenAI 帐户的更多信息,请参阅 OpenAI API网站。

  • 运行交互式Python笔记本(例如 Colab)的环境。

注意

检查 langchain-voyageai 包的要求,确保您使用兼容的Python版本。

为此教程设置环境。 通过保存具有 .ipynb 扩展名的文件来创建交互式Python笔记本。 此 Notebook 允许您单独运行Python代码片段,并且您将使用它来运行本教程中的代码。

要设立笔记本环境,请执行以下操作:

1

在笔记本中运行以下命令:

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-voyageai langchain-openai pymongo pypdf
2

运行以下代码为本教程设立环境变量。提供您的API密钥和MongoDB集群的连接字符串。

import os
os.environ["VOYAGE_API_KEY"] = "<voyage-api-key>"
os.environ["OPENAI_API_KEY"] = "<openai-api-key>"
MONGODB_URI = "<connection-string>"

注意

<connection-string> 替换为您的 Atlas 集群或本地部署的连接字符串。

连接字符串应使用以下格式:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

要学习;了解更多信息,请参阅通过驱动程序连接到集群。

连接字符串应使用以下格式:

mongodb://localhost:<port-number>/?directConnection=true

要学习;了解更多信息,请参阅连接字符串。

您必须使用MongoDB作为数据的向量存储。您可以使用MongoDB中的现有集合来实例化向量存储。

1

如果还没有,请完成将示例数据加载到集群的步骤

注意

如果您想使用自己的数据,请参阅 LangChain 入门如何创建向量嵌入,以了解如何将向量嵌入导入 Atlas。

2

在笔记本中粘贴并运行以下代码,以从Atlas中的 sample_mflix.embedded_movies命名空间创建一个名为 vector_store 的向量存储实例。 此代码使用 from_connection_string 方法创建 MongoDBAtlasVectorSearch 向量存储并指定以下参数:

  • MongoDB集群的连接字符串。

  • 来自 Voyage AI的 voyage-3-large 嵌入模型,用于将文本转换为向量嵌入。

  • sample_mflix.embedded movies 作为要使用的命名空间空间。

  • plot 作为包含文本的字段。

  • plot_embedding_voyage_3_large 作为包含嵌入的字段。

  • dotProduct 作为相关性得分函数。

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_voyageai import VoyageAIEmbeddings
# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
connection_string = MONGODB_URI,
embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048),
namespace = "sample_mflix.embedded_movies",
text_key = "plot",
embedding_key = "plot_embedding_voyage_3_large",
relevance_score_fn = "dotProduct"
)

提示

要在向量存储上启用混合搜索查询,请在集合上创建MongoDB Vector Search 和MongoDB Search索引。您可以使用 LangChain 辅助方法或PyMongo驱动程序方法创建索引:

1

运行以下代码以创建向量搜索索引,为集合中的plot_embedding_voyage_3_large 字段编制索引。

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
dimensions = 2048 # The dimensions of the vector embeddings to be indexed
)
2

在笔记本中运行以下代码以创建搜索索引,为集合中的 plot字段建立索引。

from langchain_mongodb.index import create_fulltext_search_index
from pymongo import MongoClient
# Connect to your cluster
client = MongoClient(MONGODB_URI)
# Use helper method to create the search index
create_fulltext_search_index(
collection = client["sample_mflix"]["embedded_movies"],
field = "plot",
index_name = "search_index"
)
1

运行以下代码以创建向量搜索索引,为集合中的plot_embedding_voyage_3_large 字段编制索引。

from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
# Connect to your cluster
client = MongoClient(MONGODB_URI)
collection = client["sample_mflix"]["embedded_movies"]
# Create your vector search index model, then create the index
vector_index_model = SearchIndexModel(
definition={
"fields": [
{
"type": "vector",
"path": "plot_embedding_voyage_3_large",
"numDimensions": 2048,
"similarity": "dotProduct"
}
]
},
name="vector_index",
type="vectorSearch"
)
collection.create_search_index(model=vector_index_model)
2

运行以下代码以创建搜索索引,为集合中的plot 字段编制索引。

1# Create your search index model, then create the search index
2search_index_model = SearchIndexModel(
3 definition={
4 "mappings": {
5 "dynamic": False,
6 "fields": {
7 "plot": {
8 "type": "string"
9 }
10 }
11 }
12 },
13 name="search_index"
14)
15collection.create_search_index(model=search_index_model)

构建索引大约需要一分钟时间。在构建时,索引处于初始同步状态。构建完成后,您可以开始查询集合中的数据。

MongoDB构建索引后,您可以对数据运行混合搜索查询。以下代码使用MongoDBAtlasHybridSearchRetriever 检索器对字符串"time travel" 执行混合搜索。它还指定了以下参数:

  • vectorstore:向量存储实例的名称。

  • search_index_name: MongoDB Search索引的名称。

  • top_k:要返回的文档数。

  • fulltext_penalty:全文搜索的惩罚。

    惩罚越低,全文搜索分数就越高。

  • vector_penalty:向量搜索的惩罚。

    惩罚越低,向量搜索分数就越高。

检索器返回按全文搜索分数和向量搜索分数之和排序的文档列表。 代码示例的最终输出包括标题、图表和每个文档的不同分数。

要学习;了解有关混合搜索查询结果的更多信息,请参阅关于查询。

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
vectorstore = vector_store,
search_index_name = "search_index",
top_k = 5,
fulltext_penalty = 50,
vector_penalty = 50,
post_filter=[
{
"$project": {
"plot_embedding": 0,
"plot_embedding_voyage_3_large": 0
}
}
])
# Define your query
query = "time travel"
# Print results
documents = retriever.invoke(query)
for doc in documents:
print("Title: " + doc.metadata["title"])
print("Plot: " + doc.page_content)
print("Search score: {}".format(doc.metadata["fulltext_score"]))
print("Vector Search score: {}".format(doc.metadata["vector_score"]))
print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))
Title: Timecop
Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past.
Search score: 0.019230769230769232
Vector Search score: 0.018518518518518517
Total score: 0.03774928774928775
Title: A.P.E.X.
Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of the project scientists, a man named Nicholas Sinclair into a...
Search score: 0.018518518518518517
Vector Search score: 0.018867924528301886
Total score: 0.0373864430468204
Title: About Time
Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think.
Search score: 0
Vector Search score: 0.0196078431372549
Total score: 0.0196078431372549
Title: The Time Traveler's Wife
Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage.
Search score: 0.0196078431372549
Vector Search score: 0
Total score: 0.0196078431372549
Title: Retroactive
Plot: A psychiatrist makes multiple trips through time to save a woman that was murdered by her brutal husband.
Search score: 0
Vector Search score: 0.019230769230769232
Total score: 0.019230769230769232

您可以将混合搜索结果传递到 RAG管道中,以便对检索到的文档生成响应。示例代码执行以下操作:

  • 定义 LangChain 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。LangChain 将这些文档传递给 {context} 输入变量,并将您的查询传递给 {query} 变量。

  • 构造一条指定以下内容的链:

    • 您定义的用于检索相关文档的混合搜索检索器。

    • 您定义的提示模板。

    • OpenAI 的法学硕士,用于生成上下文感知响应。 默认下,这是 gpt-3.5-turbo 模型。

  • 使用示例查询提示链并返回响应。 生成的响应可能会有所不同。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
# Define a prompt template
template = """
Use the following pieces of context to answer the question at the end.
{context}
Question: Can you recommend some movies about {query}?
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
{"context": retriever, "query": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
# Prompt the chain
query = "time travel"
answer = chain.invoke(query)
print(answer)
Certainly! Here are some movies about time travel from the context provided:
1. **Timecop (1994)**
Genre: Action, Crime, Sci-Fi
Plot: A law enforcement officer working for the Time Enforcement Commission battles a shady politician with a personal tie to his past.
IMDb Rating: 5.8
2. **A.P.E.X. (1994)**
Genre: Action, Sci-Fi
Plot: A time-travel experiment gone wrong thrusts a scientist into an alternate timeline plagued by killer robots.
IMDb Rating: 4.3
3. **About Time (2013)**
Genre: Drama, Fantasy, Romance
Plot: A young man discovers he can time travel and uses this ability to improve his life, especially his love life, but learns the limitations and challenges of his gift.
IMDb Rating: 7.8
4. **The Time Traveler's Wife (2009)**
Genre: Drama, Fantasy, Romance
Plot: A Chicago librarian with a gene causing him to involuntarily time travel struggles with its impact on his romantic relationship and marriage.
IMDb Rating: 7.1
5. **Retroactive (1997)**
Genre: Action, Crime, Drama
Plot: A woman accidentally time-travels to prevent a violent event, but her attempts to fix the situation lead to worsening consequences due to repeated time cycles.
IMDb Rating: 6.3
Each movie covers time travel with unique perspectives, from action-packed adventures to romantic dramas.

后退

内存与语义缓存

在此页面上