您可以将MongoDB与 LangChain 集成以执行 混合搜索。在本教程中,您将完成以下步骤:
设置环境。
使用MongoDB作为向量存储。
对数据创建MongoDB Vector Search 和MongoDB Search索引。
运行混合搜索查询。
将查询结果传递到 RAG管道。
使用本教程的可运行版本以作为 Python 笔记本。
先决条件
如要完成本教程,您必须具备以下条件:
以下MongoDB 集群类型之一:
一个 Atlas 集群,运行 MongoDB 6.0.11、7.0.2 或更高版本。请确保您的 IP 地址包含在 Atlas 项目的访问列表中。
使用Atlas CLI创建的本地Atlas部署。要学习;了解更多信息,请参阅创建本地Atlas部署。
安装了Search 和 Vector Search的MongoDB Community或 Enterprise集群。
Voyage AI API密钥。要创建API密钥,请参阅对API密钥建模。
OpenAI API密钥。您必须拥有一个具有可用于API请求的积分的 OpenAI 帐户。要学习;了解有关注册 OpenAI 帐户的更多信息,请参阅 OpenAI API网站。
运行交互式Python笔记本(例如 Colab)的环境。
注意
检查 langchain-voyageai 包的要求,确保您使用兼容的Python版本。
设置环境
为此教程设置环境。 通过保存具有 .ipynb 扩展名的文件来创建交互式Python笔记本。 此 Notebook 允许您单独运行Python代码片段,并且您将使用它来运行本教程中的代码。
要设立笔记本环境,请执行以下操作:
设置环境变量。
运行以下代码为本教程设立环境变量。提供您的API密钥和MongoDB集群的连接字符串。
import os os.environ["VOYAGE_API_KEY"] = "<voyage-api-key>" os.environ["OPENAI_API_KEY"] = "<openai-api-key>" MONGODB_URI = "<connection-string>"
注意
将 <connection-string> 替换为您的 Atlas 集群或本地部署的连接字符串。
连接字符串应使用以下格式:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
要学习;了解更多信息,请参阅通过客户端库连接到集群。
使用MongoDB作为向量存储
您必须使用MongoDB作为数据的向量存储。您可以使用MongoDB中的现有集合来实例化向量存储。
加载示例数据。
如果还没有,请完成将示例数据加载到集群的步骤。
注意
If you want to use your own data, see LangChain Get Started or How to Create Vector Embeddings to learn how to ingest vector embeddings into Atlas.
实例化向量存储。
在笔记本中粘贴并运行以下代码,以从Atlas中的 sample_mflix.embedded_movies命名空间创建一个名为 vector_store 的向量存储实例。 此代码使用 from_connection_string 方法创建 MongoDBAtlasVectorSearch 向量存储并指定以下参数:
MongoDB集群的连接字符串。
来自 Voyage AI的
voyage-3-large嵌入模型,用于将文本转换为向量嵌入。sample_mflix.embedded movies作为要使用的命名空间空间。plot作为包含文本的字段。plot_embedding_voyage_3_large作为包含嵌入的字段。dotProduct作为相关性得分函数。
from langchain_mongodb import MongoDBAtlasVectorSearch from langchain_voyageai import VoyageAIEmbeddings # Create the vector store vector_store = MongoDBAtlasVectorSearch.from_connection_string( connection_string = MONGODB_URI, embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048), namespace = "sample_mflix.embedded_movies", text_key = "plot", embedding_key = "plot_embedding_voyage_3_large", relevance_score_fn = "dotProduct" )
创建索引
要在向量存储上启用混合搜索查询,请在集合上创建MongoDB Vector Search 和MongoDB Search索引。您可以使用 LangChain 辅助方法或PyMongo驱动程序方法创建索引:
创建MongoDB Vector Search索引。
Run the following code to create a vector search index that indexes the plot_embedding_voyage_3_large field in the collection.
# Use helper method to create the vector search index vector_store.create_vector_search_index( dimensions = 2048 # The dimensions of the vector embeddings to be indexed )
创建MongoDB Search索引。
在笔记本中运行以下代码以创建搜索索引,为集合中的 plot字段建立索引。
from langchain_mongodb.index import create_fulltext_search_index from pymongo import MongoClient # Connect to your cluster client = MongoClient(MONGODB_URI) # Use helper method to create the search index create_fulltext_search_index( collection = client["sample_mflix"]["embedded_movies"], field = "plot", index_name = "search_index" )
创建MongoDB Vector Search索引。
Run the following code to create a vector search index that indexes the plot_embedding_voyage_3_large field in the collection.
from pymongo import MongoClient from pymongo.operations import SearchIndexModel # Connect to your cluster client = MongoClient(MONGODB_URI) collection = client["sample_mflix"]["embedded_movies"] # Create your vector search index model, then create the index vector_index_model = SearchIndexModel( definition={ "fields": [ { "type": "vector", "path": "plot_embedding_voyage_3_large", "numDimensions": 2048, "similarity": "dotProduct" } ] }, name="vector_index", type="vectorSearch" ) collection.create_search_index(model=vector_index_model)
创建MongoDB Search索引。
运行以下代码以创建搜索索引,为集合中的plot 字段编制索引。
1 # Create your search index model, then create the search index 2 search_index_model = SearchIndexModel( 3 definition={ 4 "mappings": { 5 "dynamic": False, 6 "fields": { 7 "plot": { 8 "type": "string" 9 } 10 } 11 } 12 }, 13 name="search_index" 14 ) 15 collection.create_search_index(model=search_index_model)
构建索引大约需要一分钟时间。在构建时,索引处于初始同步状态。构建完成后,您可以开始查询集合中的数据。
运行混合搜索查询
MongoDB构建索引后,您可以对数据运行混合搜索查询。以下代码使用MongoDBAtlasHybridSearchRetriever 检索器对字符串"time travel" 执行混合搜索。它还指定了以下参数:
vectorstore:向量存储实例的名称。search_index_name: MongoDB Search索引的名称。top_k:要返回的文档数。fulltext_penalty:全文搜索的惩罚。惩罚越低,全文搜索分数就越高。
vector_penalty:向量搜索的惩罚。惩罚越低,向量搜索分数就越高。
检索器返回按全文搜索分数和向量搜索分数之和排序的文档列表。 代码示例的最终输出包括标题、图表和每个文档的不同分数。
要学习;了解有关混合搜索查询结果的更多信息,请参阅关于查询。
from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever # Initialize the retriever retriever = MongoDBAtlasHybridSearchRetriever( vectorstore = vector_store, search_index_name = "search_index", top_k = 5, fulltext_penalty = 50, vector_penalty = 50, post_filter=[ { "$project": { "plot_embedding": 0, "plot_embedding_voyage_3_large": 0 } } ]) # Define your query query = "time travel" # Print results documents = retriever.invoke(query) for doc in documents: print("Title: " + doc.metadata["title"]) print("Plot: " + doc.page_content) print("Search score: {}".format(doc.metadata["fulltext_score"])) print("Vector Search score: {}".format(doc.metadata["vector_score"])) print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))
Title: Timecop Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past. Search score: 0.019230769230769232 Vector Search score: 0.018518518518518517 Total score: 0.03774928774928775 Title: A.P.E.X. Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of the project scientists, a man named Nicholas Sinclair into a... Search score: 0.018518518518518517 Vector Search score: 0.018867924528301886 Total score: 0.0373864430468204 Title: About Time Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think. Search score: 0 Vector Search score: 0.0196078431372549 Total score: 0.0196078431372549 Title: The Time Traveler's Wife Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage. Search score: 0.0196078431372549 Vector Search score: 0 Total score: 0.0196078431372549 Title: Retroactive Plot: A psychiatrist makes multiple trips through time to save a woman that was murdered by her brutal husband. Search score: 0 Vector Search score: 0.019230769230769232 Total score: 0.019230769230769232
将结果传递到 RAG 管道
您可以将混合搜索结果传递到 RAG管道中,以便对检索到的文档生成响应。示例代码执行以下操作:
定义 LangChain 提示模板,指示 LLM 使用检索到的文档作为查询的上下文。LangChain 将这些文档传递给
{context}输入变量,并将您的查询传递给{query}变量。您定义的用于检索相关文档的混合搜索检索器。
您定义的提示模板。
OpenAI 的法学硕士,用于生成上下文感知响应。 默认下,这是
gpt-3.5-turbo模型。
使用示例查询提示链并返回响应。 生成的响应可能会有所不同。
from langchain_core.output_parsers import StrOutputParser from langchain_core.prompts import PromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI # Define a prompt template template = """ Use the following pieces of context to answer the question at the end. {context} Question: Can you recommend some movies about {query}? """ prompt = PromptTemplate.from_template(template) model = ChatOpenAI() # Construct a chain to answer questions on your data chain = ( {"context": retriever, "query": RunnablePassthrough()} | prompt | model | StrOutputParser() ) # Prompt the chain query = "time travel" answer = chain.invoke(query) print(answer)
Certainly! Here are some movies about time travel from the context provided: 1. **Timecop (1994)** Genre: Action, Crime, Sci-Fi Plot: A law enforcement officer working for the Time Enforcement Commission battles a shady politician with a personal tie to his past. IMDb Rating: 5.8 2. **A.P.E.X. (1994)** Genre: Action, Sci-Fi Plot: A time-travel experiment gone wrong thrusts a scientist into an alternate timeline plagued by killer robots. IMDb Rating: 4.3 3. **About Time (2013)** Genre: Drama, Fantasy, Romance Plot: A young man discovers he can time travel and uses this ability to improve his life, especially his love life, but learns the limitations and challenges of his gift. IMDb Rating: 7.8 4. **The Time Traveler's Wife (2009)** Genre: Drama, Fantasy, Romance Plot: A Chicago librarian with a gene causing him to involuntarily time travel struggles with its impact on his romantic relationship and marriage. IMDb Rating: 7.1 5. **Retroactive (1997)** Genre: Action, Crime, Drama Plot: A woman accidentally time-travels to prevent a violent event, but her attempts to fix the situation lead to worsening consequences due to repeated time cycles. IMDb Rating: 6.3 Each movie covers time travel with unique perspectives, from action-packed adventures to romantic dramas.