您可以将 Atlas Vector Search 与 LangChain 集成来构建生成式人工智能和 RAG 应用程序。本页概述了 MongoDB LangChain Python 集成以及您可以在应用程序中使用的不同组件。



有关组件和方法的完整列表,请参阅 API参考。

有关 JavaScript 集成,请参阅 LangChain JS/TS 集成入门

要将 Atlas Vector Search 与 LangChain 一起使用,您必须首先安装 langchain-mongodb 包:

pip install langchain-mongodb

某些组件还需要以下 LangChain 基础包:

pip install langchain langchain_community

MongoDBAtlasVectorSearch 是一个向量存储,允许您在 Atlas 中存储和检索集合中的向量嵌入。您可以使用此组件存储数据中的嵌入,并使用 Atlas Vector Search 进行检索。

此组件需要一个 Atlas Vector Search 索引。

from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from pymongo import MongoClient
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Instantiate the vector store
vector_store = MongoDBAtlasVectorSearch(
collection = collection, # Collection to store embeddings
embedding = FakeEmbeddings(), # Embedding model to use
index_name = "vector_index", # Name of the vector search index
relevance_score_fn = "cosine" # Similarity score function, can also be "euclidean" or "dotProduct"


LangChain 检索器是用于从向量存储中获取相关文档的组件。您可以使用 LangChain 的内置检索器或以下 MongoDB 检索器从 Atlas 查询和检索数据。

MongoDBAtlasFullTextSearchRetriever 是使用 Atlas Search 进行全文搜索的检索器。具体来说,它使用 Lucene 的标准 BM25 算法

此检索器需要 Atlas Search 索引。

from langchain_mongodb.retrievers.full_text_search import MongoDBAtlasFullTextSearchRetriever
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Initialize the retriever
retriever = MongoDBAtlasFullTextSearchRetriever(
collection = collection, # MongoDB Collection in Atlas
search_field = "<field-name>", # Name of the field to search
search_index_name = "<index-name>" # Name of the search index
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:

MongoDBAtlasHybridSearchRetriever 是使用倒数排名融合 (RRF) 算法将向量搜索和全文搜索结果相结合的检索器。如要了解更多信息,请参阅如何执行混合搜索

该检索器需要现有的向量存储Atlas Vector Search 索引Atlas Search 索引

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
vectorstore = <vector-store>, # Vector store instance
search_index_name = "<index-name>", # Name of the Atlas Search index
top_k = 5, # Number of documents to return
fulltext_penalty = 60.0, # Penalty for full-text search
vector_penalty = 60.0 # Penalty for vector search
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:

MongoDBAtlasParentDocumentRetriever 是一个检索器,它首先查询较小的数据段,然后将较大的父文档返回给 LLM 。这种类型的检索称为父文档检索。父文档检索允许对较小的数据段进行更细粒度的搜索,同时为 LLM 提供父文档的完整上下文,从而提高 RAG 代理和应用程序的响应。



from langchain_mongodb.retrievers.parent_document import ParentDocumentRetriever
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
retriever = MongoDBAtlasParentDocumentRetriever.from_connection_string(
connection_string = <connection-string>, # Atlas connection string
embedding_model = OpenAIEmbeddings(), # Embedding model to use
child_splitter = RecursiveCharacterTextSplitter(), # Text splitter to use
database_name = <database-name>, # Database to store the collection
collection_name = <collection-name>, # Collection to store the collection
# Additional vector store or parent class arguments...
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:

缓存用于通过存储相似或重复查询的重复响应来优化 LLM 性能,以避免重新计算它们。MongoDB 为 LangChain 应用程序提供以下缓存。

MongoDBCache 允许您在 Atlas 中存储基本缓存。

from langchain_mongodb import MongoDBCache
from langchain_core.globals import set_llm_cache
connection_string = "<connection-string>", # Atlas connection string
database_name = "<database-name>", # Database to store the cache
collection_name = "<collection-name>" # Collection to store the cache


MongoDBAtlasSemanticCache 是一个语义缓存,它使用 Atlas Vector Search 来检索缓存的提示。该组件需要 Atlas Vector Search 索引。

from langchain_mongodb import MongoDBAtlasSemanticCache
from langchain_core.globals import set_llm_cache
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
embedding = FakeEmbeddings(), # Embedding model to use
connection_string = "<connection-string>", # Atlas connection string
database_name = "<database-name>", # Database to store the cache
collection_name = "<collection-name>" # Collection to store the cache

文档加载器是帮助您为 LangChain 应用程序加载数据的工具。

MongodbLoader 是一个文档加载器,可从 MongoDB 数据库返回文档列表。

from langchain_community.document_loaders.mongodb import MongodbLoader
loader = MongodbLoader(
connection_string = "<connection-string>", # Atlas cluster or local MongoDB instance URI
db_name = "<database-name>", # Database that contains the collection
collection_name = "<collection-name>", # Collection to load documents from
filter_criteria = { <filter-document> }, # Optional document to specify a filter
field_names = ["<field-name>", ... ] # List of fields to return
docs = loader.load()

MongoDBChatMessageHistory 是一个允许您在 MongoDB 数据库中存储和管理聊天消息历史记录的组件。它可以保存与唯一会话标识符关联的用户和 AI 生成的消息。这对于需要跟踪一段时间内交互的应用程序(例如聊天机器人)非常有用。

from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
chat_message_history = MongoDBChatMessageHistory(
session_id = "<session-id>", # Unique session identifier
connection_string = "<connection-string>", # Atlas cluster or local MongoDB instance URI
database_name = "<database-name>", # Database to store the chat history
collection_name = "<collection-name>" # Collection to store the chat history
[HumanMessage(content='Hello'), AIMessage(content='Hi')]


MongoDBDocStore 是使用MongoDB来存储和管理文档的自定义键值存储。您可以像对任何其他MongoDB集合一样执行增删改查操作。

from pymongo import MongoClient
from langchain_core.documents import Document
from langchain_mongodb.docstores import MongoDBDocStore
# Replace with your MongoDB connection string and namespace
connection_string = "<connection-string>"
namespace = "<database-name>.<collection-name>"
# Initialize the MongoDBDocStore
docstore = MongoDBDocStore.from_connection_string(connection_string, namespace)

MongoDBByteStore 是一个自定义数据存储,它使用MongoDB存储和管理二进制数据,特别是以字节表示的数据。 您可以使用键值对执行CRUD操作,其中键是字符串,值是字节序列。

from import MongoDBByteStore
# Instantiate the MongoDBByteStore
mongodb_store = MongoDBByteStore(
connection_string = "<connection-string>", # Atlas cluster or local MongoDB instance URI
db_name = "<database-name>", # Name of the database
collection_name = "<collection-name>" # Name of the collection
# Set values for keys
mongodb_store.mset([("key1", b"hello"), ("key2", b"world")])
# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
print(values) # Output: [b'hello', b'world']
# Iterate over keys
for key in mongodb_store.yield_keys():
print(key) # Output: key1, key2
# Delete keys
mongodb_store.mdelete(["key1", "key2"])

