您可以将MongoDB与 LangChain 集成以执行 混合搜索。在本教程中,您将完成以下步骤:
设置环境。
使用MongoDB作为向量存储。
对数据创建MongoDB Vector Search 和MongoDB Search索引。
运行混合搜索查询。
将查询结果传递到 RAG管道。
使用本教程的可运行版本以作为 Python 笔记本。
先决条件
如要完成本教程,您必须具备以下条件:
以下MongoDB 集群类型之一:
一个 Atlas 集群,运行 MongoDB 6.0.11、7.0.2 或更高版本。请确保您的 IP 地址包含在 Atlas 项目的访问列表中。
使用Atlas CLI创建的本地Atlas部署。要学习;了解更多信息,请参阅创建本地Atlas部署。
安装了Search 和 Vector Search的MongoDB Community或 Enterprise集群。
Voyage AI API密钥。要创建帐户和API密钥,请参阅 Voyage AI网站。
OpenAI API密钥。您必须拥有一个具有可用于API请求的积分的 OpenAI 帐户。要学习;了解有关注册 OpenAI 帐户的更多信息,请参阅 OpenAI API网站。
运行交互式Python笔记本(例如 Colab)的环境。
注意
检查 langchain-voyageai 包的要求,确保您使用兼容的Python版本。
设置环境
为此教程设置环境。 通过保存具有 .ipynb 扩展名的文件来创建交互式Python笔记本。 此 Notebook 允许您单独运行Python代码片段,并且您将使用它来运行本教程中的代码。
要设立笔记本环境,请执行以下操作:
设置环境变量。
运行以下代码为本教程设立环境变量。提供您的API密钥和MongoDB集群的连接字符串。
import os os.environ["VOYAGE_API_KEY"] = "<voyage-api-key>" os.environ["OPENAI_API_KEY"] = "<openai-api-key>" MONGODB_URI = "<connection-string>"
注意
将 <connection-string> 替换为您的 Atlas 集群或本地部署的连接字符串。
连接字符串应使用以下格式:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
要学习;了解更多信息,请参阅通过驱动程序连接到集群。
使用MongoDB作为向量存储
您必须使用MongoDB作为数据的向量存储。您可以使用MongoDB中的现有集合来实例化向量存储。
加载示例数据。
如果还没有,请完成将示例数据加载到集群的步骤。
注意
如果您想使用自己的数据,请参阅 LangChain 入门或如何创建向量嵌入,以了解如何将向量嵌入导入 Atlas。
实例化向量存储。
在笔记本中粘贴并运行以下代码,以从Atlas中的 sample_mflix.embedded_movies命名空间创建一个名为 vector_store 的向量存储实例。 此代码使用 from_connection_string 方法创建 MongoDBAtlasVectorSearch 向量存储并指定以下参数:
MongoDB集群的连接字符串。
来自 Voyage AI的
voyage-3-large嵌入模型,用于将文本转换为向量嵌入。sample_mflix.embedded movies作为要使用的命名空间空间。plot作为包含文本的字段。plot_embedding_voyage_3_large作为包含嵌入的字段。dotProduct作为相关性得分函数。
from langchain_mongodb import MongoDBAtlasVectorSearch from langchain_voyageai import VoyageAIEmbeddings # Create the vector store vector_store = MongoDBAtlasVectorSearch.from_connection_string( connection_string = MONGODB_URI, embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048), namespace = "sample_mflix.embedded_movies", text_key = "plot", embedding_key = "plot_embedding_voyage_3_large", relevance_score_fn = "dotProduct" )
创建索引
要在向量存储上启用混合搜索查询,请在集合上创建MongoDB Vector Search 和MongoDB Search索引。您可以使用 LangChain 辅助方法或PyMongo驱动程序方法创建索引:
创建MongoDB Search索引。
在笔记本中运行以下代码以创建搜索索引,为集合中的 plot字段建立索引。
from langchain_mongodb.index import create_fulltext_search_index from pymongo import MongoClient # Connect to your cluster client = MongoClient(MONGODB_URI) # Use helper method to create the search index create_fulltext_search_index( collection = client["sample_mflix"]["embedded_movies"], field = "plot", index_name = "search_index" )
创建MongoDB Vector Search索引。
运行以下代码以创建向量搜索索引,为集合中的plot_embedding_voyage_3_large 字段编制索引。
from pymongo import MongoClient from pymongo.operations import SearchIndexModel # Connect to your cluster client = MongoClient(MONGODB_URI) collection = client["sample_mflix"]["embedded_movies"] # Create your vector search index model, then create the index vector_index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding_voyage_3_large", "numDimensions": 2048, "similarity": "dotProduct" } ] }, name="vector_index", type="vectorSearch" ) collection.create_search_index(model=vector_index_model)
创建MongoDB Search索引。
运行以下代码以创建搜索索引,为集合中的plot 字段编制索引。
1 # Create your search index model, then create the search index 2 search_index_model = SearchIndexModel( 3 definition={ 4 "mappings": { 5 "dynamic": False, 6 "fields": { 7 "plot": { 8 "type": "string" 9 } 10 } 11 } 12 }, 13 name="search_index" 14 ) 15 collection.create_search_index(model=search_index_model)
构建索引大约需要一分钟时间。在构建时,索引处于初始同步状态。构建完成后,您可以开始查询集合中的数据。
运行混合搜索查询
MongoDB构建索引后,您可以对数据运行混合搜索查询。以下代码使用MongoDBAtlasHybridSearchRetriever 检索器对字符串"time travel" 执行混合搜索。它还指定了以下参数:
vectorstore:向量存储实例的名称。search_index_name: MongoDB Search索引的名称。top_k:要返回的文档数。fulltext_penalty:全文搜索的惩罚。惩罚越低,全文搜索分数就越高。
vector_penalty:向量搜索的惩罚。惩罚越低,向量搜索分数就越高。
检索器返回按全文搜索分数和向量搜索分数之和排序的文档列表。 代码示例的最终输出包括标题、图表和每个文档的不同分数。
要学习;了解有关混合搜索查询结果的更多信息,请参阅关于查询。
from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever # Initialize the retriever retriever = MongoDBAtlasHybridSearchRetriever( vectorstore = vector_store, search_index_name = "search_index", top_k = 5, fulltext_penalty = 50, vector_penalty = 50, post_filter=[ { "$project": { "plot_embedding": 0, "plot_embedding_voyage_3_large": 0 } } ]) # Define your query query = "time travel" # Print results documents = retriever.invoke(query) for doc in documents: print("Title: " + doc.metadata["title"]) print("Plot: " + doc.page_content) print("Search score: {}".format(doc.metadata["fulltext_score"])) print("Vector Search score: {}".format(doc.metadata["vector_score"])) print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))
Title: Timecop Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past. Search score: 0.019230769230769232 Vector Search score: 0.018518518518518517 Total score: 0.03774928774928775 Title: A.P.E.X. Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of the project scientists, a man named Nicholas Sinclair into a... Search score: 0.018518518518518517 Vector Search score: 0.018867924528301886 Total score: 0.0373864430468204 Title: About Time Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think. Search score: 0 Vector Search score: 0.0196078431372549 Total score: 0.0196078431372549 Title: The Time Traveler's Wife Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage. Search score: 0.0196078431372549 Vector Search score: 0 Total score: 0.0196078431372549 Title: Retroactive Plot: A psychiatrist makes multiple trips through time to save a woman that was murdered by her brutal husband. Search score: 0 Vector Search score: 0.019230769230769232 Total score: 0.019230769230769232
将结果传递到 RAG 管道
您可以将混合搜索结果传递到 RAG管道中,以便对检索到的文档生成响应。示例代码执行以下操作:
定义 LangChain 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。LangChain 将这些文档传递给
{context}输入变量,并将您的查询传递给{query}变量。您定义的用于检索相关文档的混合搜索检索器。
您定义的提示模板。
OpenAI 的法学硕士,用于生成上下文感知响应。 默认下,这是
gpt-3.5-turbo模型。
使用示例查询提示链并返回响应。 生成的响应可能会有所不同。
from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI # Define a prompt template template = """ Use the following pieces of context to answer the question at the end. {context} Question: Can you recommend some movies about {query}? """ prompt = PromptTemplate.from_template(template) model = ChatOpenAI() # Construct a chain to answer questions on your data chain = ( {"context": retriever, "query": RunnablePassthrough()} | prompt | model | StrOutputParser() ) # Prompt the chain query = "time travel" answer = chain.invoke(query) print(answer)
Certainly! Here are some movies about time travel from the context provided: 1. **Timecop (1994)** Genre: Action, Crime, Sci-Fi Plot: A law enforcement officer working for the Time Enforcement Commission battles a shady politician with a personal tie to his past. IMDb Rating: 5.8 2. **A.P.E.X. (1994)** Genre: Action, Sci-Fi Plot: A time-travel experiment gone wrong thrusts a scientist into an alternate timeline plagued by killer robots. IMDb Rating: 4.3 3. **About Time (2013)** Genre: Drama, Fantasy, Romance Plot: A young man discovers he can time travel and uses this ability to improve his life, especially his love life, but learns the limitations and challenges of his gift. IMDb Rating: 7.8 4. **The Time Traveler's Wife (2009)** Genre: Drama, Fantasy, Romance Plot: A Chicago librarian with a gene causing him to involuntarily time travel struggles with its impact on his romantic relationship and marriage. IMDb Rating: 7.1 5. **Retroactive (1997)** Genre: Action, Crime, Drama Plot: A woman accidentally time-travels to prevent a violent event, but her attempts to fix the situation lead to worsening consequences due to repeated time cycles. IMDb Rating: 6.3 Each movie covers time travel with unique perspectives, from action-packed adventures to romantic dramas.