/ /

MongoDBと Lgachein でローカル RAG 実装を構築

MongoDB Atlas をクラウドに配置するだけでなく、Atlas CLI を使用して自己整合型MongoDBインスタンスをローカルマシンに配置します。LgChuin MongoDB統合は、Atlas クラスターとローカル配置の両方をサポートします。接続文字列パラメーターを指定する場合、クラスター接続文字列の代わりにローカル配置の接続文字列を指定できます。

このチュートリアルでは、ローカル Atlas 配置、ローカルモデル、およびMongoDB統合を使用して、検索拡張生成 (RAG)（RAG）を実装する方法を説明します。具体的には、次のアクションを実行します。

Atlas のローカル配置の作成。
ローカル埋め込みモデルを使用してベクトル埋め込みを生成します。
Atlas のローカル配置をベクトルストアとして使用します。
ローカル LVM を使用して、データに関する質問に答えます。

このチュートリアルの実行可能なバージョンを Python エディタとして操作します。

LgChuin を使用せずに RG をローカルに実装する方法については、「 MongoDB ベクトル検索を使用してローカル RAG 実装を構築する」を参照してください。

前提条件

Atlas のサンプルデータセットからの映画データを含むコレクションを使用します。

Atlas CLIがインストールされ、v 1.14.3以降を実行している。
ローカルで実行できるインタラクティブPythonノート。VS CodeでインタラクティブPythonノートを実行できます。環境でPython v3 が実行されていることを確認します。10以降に更新します。

Atlas のローカル配置の作成

ローカル配置を作成するには、ターミナルで atlas deployments setup を実行し、プロンプトに従って配置を作成します。

詳細な手順については、「ローカル Atlas 配置の作成」を参照してください。

ローカル配置について

Atlas CLI を使用して、ローカル Atlas 配置を作成します。Atlas CLI はMongoDB Atlasのコマンドラインインターフェイスであり、Atlas CLI を使用してターミナルから Atlas とやり取りし、ローカル Atlas 配置の作成を含むさまざまなタスクを実行できます。これらは、クラウドに接続する必要のない完全にローカル配置です。

Atlas のローカル配置はテスト専用です。本番環境には、クラスターをデプロイします。

環境を設定する

このセクションでは、このチュートリアルの環境を設定します。

プロジェクトを保存するためのディレクトリを作成します。

ターミナルで次のコマンドを実行して、local-rag-langchain-mongodb という新しいディレクトリを作成します。

mkdir local-rag-langchain-mongodb
cd local-rag-langchain-mongodb

対話型の Python ノートブックを作成します。

次のコマンドは、langchain-local-rag.ipynb という名前のディレクトリにノートブックを作成します。

touch langchain-local-rag.ipynb

依存関係をインストールしてインポートします。

ノートブックで次のコマンドを実行します。

pip install --quiet --upgrade pymongo langchain langchain-community langchain-huggingface gpt4all pypdf

接続文字列を定義します。

以下のコードをノートブックで実行し、<port-number> をローカル配置のポートに置き換えます。

MONGODB_URI = ("mongodb://localhost:<port-number>/?directConnection=true")

ローカル配置をベクトルストアとして使用します。

Atlas のローカル配置は、ベクトルデータベースとして、またはベクトルストアとして使用できます。次のコードスニペットをコピーして、ノートに貼り付けます。

ベクトルストアをインスタンス化します。

次のコードでは、 MongoDB ベクトル検索の LgChallenge 統合を使用して、名前空間を使用して、ベクトルデータベース（ベクトルストアとも呼ばれる）としてローカルlangchain_db.local_rag Atlas 配置をインスタンス化します。

1この例では、Hugingface の bios を使用して埋め込みを生成し、

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_huggingface import HuggingFaceEmbeddings
# Load the embedding model (https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
embedding_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")
# Instantiate vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = MONGODB_URI,
   namespace = "langchain_db.local_rag",
   embedding=embedding_model,
   index_name="vector_index"
)

ベクトルストアにドキュメントを追加します。

次のコードをノートに貼り付けて実行し、最近のMongoDB収益レポートを含むサンプルPDF をベクトルストアに取り込みます。

このコードは、テキストスプリッターを使用して、PDFデータを小さな親ドキュメントに分割します。各ドキュメントのチャンクサイズ（文字数）とチャンクオーバーラップ（連続するチャンク間で重なる文字数）を指定します。

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/13176/pdf")
data = loader.load()
# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Add data to the vector store
vector_store.add_documents(docs)

このコードの実行には数分かかる場合があります。Atlaslangchain_db.local_rag を使用している場合は、Atlas UIで名前空間に移動すると、ベクトル埋め込みを確認できます。

また、配置の接続文字列を使用して、mongosh またはアプリケーションからローカル配置に接続することで、ベクトル埋め込みを表示することもできます。その後、langchain_db.local_rag コレクションに対して読み取り操作を実行できます。

MongoDB ベクトル検索インデックスを作成します。

ベクトルストアでベクトル検索クエリを有効にするには、langchain_db.testコレクションにMongoDB ベクトル検索インデックスを作成します。インデックスは、Lgacheinヘルパーメソッドを使用して作成できます。

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 1024 # The dimensions of the vector embeddings to be indexed
)

Tip

create_vector_search_index API参照

インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

ローカル LVM を使用して質問に答えます

このセクションでは、 MongoDB ベクトル検索と GPT4 All を使用してローカルで実行できるサンプルRG実装を示します。

LgChuin を使用して LVM をローカルで実行する他の方法については、モデルをローカルで実行するを参照してください。

ローカル LLM をロードします。

次のボタンをクリックして、GPT4All から Misttal 7B モデルをダウンロードします。他のモデルを確認するには、GPT4すべてのウェブサイトを参照してください。
ダウンロード
このモデルをlocal-rag-mongodbプロジェクトディレクトリに移動します。

次のコードをノートに貼り付けて、LM を構成します。を実行中前に、<path-to-model> を LM をローカルに保存したパスに置き換えます。

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import GPT4All
# Configure the LLM
local_path = "<path-to-model>"
# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]
# Verbose is required to pass to the callback manager
llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True)

データに関する質問に答えます。

次のコードを実行して、 RAGの実装を完了します。

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# Instantiate MongoDB Vector Search as a retriever
retriever = vector_store.as_retriever()
# Define prompt template
template = """
Use the following pieces of context to answer the question at the end.
{context}
Question: {question}
"""
custom_rag_prompt = PromptTemplate.from_template(template)
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)
# Create chain
rag_chain = (
   {"context": retriever | format_docs, "question": RunnablePassthrough()}
   | custom_rag_prompt
   | llm
   | StrOutputParser()
)
# Prompt the chain
question = "What was MongoDB's latest acquisition?"
answer = rag_chain.invoke(question)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
pprint.pprint(documents)

Answer: MongoDB's latest acquisition was Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation
Source documents:
[Document(id='680a98187685ddb66d29ed88', metadata={'_id': '680a98187685ddb66d29ed88', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 1, 'page_label': '2'}, page_content='Measures."\nFourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB  acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation'),
 Document(id='680a98187685ddb66d29ed8c', metadata={'_id': '680a98187685ddb66d29ed8c', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 1, 'page_label': '2'}, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
 Document(id='680a98187685ddb66d29ee3f', metadata={'_id': '680a98187685ddb66d29ee3f', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 8, 'page_label': '9'}, page_content='View original content to download multimedia:https://www.prnewswire.com/news-releases/mongodb-inc-announces-fourth-quarter-and-full-\nyear-fiscal-2025-financial-results-302393702.html'),
 Document(id='680a98187685ddb66d29edde', metadata={'_id': '680a98187685ddb66d29edde', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 3, 'page_label': '4'}, page_content='distributed database on the market. With integrated capabilities for operational data, search, real-time analytics, and AI-powered retrieval, MongoDB')]

戻る

自己クエリ検索

GraphRAG

前提条件

Atlas のローカル配置の作成

ローカル配置について

環境を設定する

プロジェクトを保存するためのディレクトリを作成します。

対話型の Python ノートブックを作成します。

依存関係をインストールしてインポートします。

接続文字列を定義します。

ローカル配置をベクトルストアとして使用します。

ベクトル ストアをインスタンス化します。

ベクトルストアにドキュメントを追加します。

MongoDB ベクトル検索インデックスを作成します。

Tip

ローカル LVM を使用して質問に答えます

ローカル LLM をロードします。

データに関する質問に答えます。

ベクトルストアをインスタンス化します。