/ /

MongoDB 및 LangChain을 사용하여 로컬 RAG 구현 구축

cloud 에 MongoDB Atlas 배포하는 것 외에도, Atlas CLI 사용하여 로컬 머신에 독립형 MongoDB 인스턴스를 배포 수 있습니다. LangChain MongoDB 통합은 Atlas 클러스터와 로컬 배포를 모두 지원합니다. 연결 문자열 매개변수를 지정할 때 클러스터 연결 연결 문자열 대신 로컬 배포서버 연결 연결 문자열 지정할 수 있습니다.

이 튜토리얼에서는 로컬 Atlas 배포서버, 로컬 모델 및 LangChain MongoDB 통합을 사용하여 RAG(검색 강화 생성) 구현 방법을 보여 줍니다. 구체적으로 다음 조치를 수행합니다.

로컬 Atlas 배포서버를 생성합니다.
로컬 임베딩 모델을 사용하여 벡터 임베딩을 생성합니다.
로컬 Atlas 배포를 벡터 저장소로 사용합니다.
로컬 LLM을 사용하여 데이터에 대한 질문에 답변 .

이 튜토리얼의 실행 가능한 버전을 Python 노트북으로 사용합니다.

LangChain을 사용하지 않고 로컬에서 RAG 를 구현 방법을 학습 MongoDB Vector Search를 사용하여 로컬 RAG 구현 구축하기를 참조하세요.

전제 조건

이 튜토리얼을 완료하려면 다음 조건을 충족해야 합니다.

Atlas CLI가 설치되어 v1.14.3 이상을 실행 중입니다.
로컬에서 실행 수 있는 대화형 Python 노트북입니다. VS Code 에서 대화형 Python 노트북을 실행 수 있습니다. 환경에서 Python v3가 실행되는지 확인합니다.10 또는 그 이후 버전.

로컬 Atlas 배포 만들기

로컬 배포를 만들려면 터미널에서 atlas deployments setup을 실행하고 메시지에 따라 배포를 만드세요.

자세한 지침은 로컬 Atlas 배포 생성을 참조하세요.

로컬 배포 정보

Atlas CLI 사용하여 로컬 Atlas 배포를 생성합니다. Atlas CLI 는 MongoDB Atlas 의 명령줄 인터페이스로, Atlas CLI 사용하여 터미널에서 Atlas 와 상호 작용 하여 로컬 Atlas 배포서버 생성을 포함한 다양한 작업을 수행할 수 있습니다. 이는 cloud 에 연결할 필요가 없는 완전한 로컬 배포입니다.

로컬 Atlas 배포는 테스트 목적으로만 사용할 수 있습니다. 프로덕션 환경의 경우에는 클러스터를 배포하세요.

환경 설정

이 섹션에서는 이 자습서를 위한 환경을 설정합니다.

프로젝트를 저장할 디렉토리를 만듭니다.

터미널에서 다음 명령을 실행하여 local-rag-langchain-mongodb 이라는 새 디렉토리를 만듭니다.

mkdir local-rag-langchain-mongodb
cd local-rag-langchain-mongodb

대화형 Python 노트북을 만듭니다.

다음 명령은 langchain-local-rag.ipynb라는 디렉토리에 노트북을 생성합니다.

touch langchain-local-rag.ipynb

종속성을 설치하고 가져옵니다.

노트북에서 다음 명령을 실행합니다.

pip install --quiet --upgrade pymongo langchain langchain-community langchain-huggingface gpt4all pypdf

연결 문자열을 정의합니다.

노트북에서 다음 코드를 실행하고 <port-number>를 로컬 배포의 포트로 교체하세요.

MONGODB_URI = ("mongodb://localhost:<port-number>/?directConnection=true")

로컬 배포를 벡터 저장소로 사용하세요.

로컬 Atlas 배포서버 벡터 저장 라고도 하는 벡터 데이터베이스 로 사용할 수 있습니다. 다음 코드 스니펫을 복사하여 노트북에 붙여넣습니다.

벡터 저장소를 인스턴스화합니다.

다음 코드는 MongoDB Vector Search용 LangChain 통합을 사용하여 네임스페이스 사용하여 로컬 Atlas 배포서버 벡터 저장 라고도 하는 벡터 데이터베이스 로 langchain_db.local_rag 인스턴스화합니다.

이 예시 Hugging Face의 mixedbread-ai/mxbai-embed-large-v1 모델을 지정합니다.

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_huggingface import HuggingFaceEmbeddings
# Load the embedding model (https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1)
embedding_model = HuggingFaceEmbeddings(model_name="mixedbread-ai/mxbai-embed-large-v1")
# Instantiate vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = MONGODB_URI,
   namespace = "langchain_db.local_rag",
   embedding=embedding_model,
   index_name="vector_index"
)

벡터 저장소에 문서를 추가합니다.

노트북에 다음 코드를 붙여넣고 실행 최근 MongoDB 수익 보고서 가 포함된 샘플 PDF를 vector 저장에 수집합니다.

이 코드는 텍스트 분할기를 사용하여 PDF 데이터를 더 작은 상위 문서로 청크화합니다. 각 문서에 대해 청크 크기(문자 수)와 청크 겹침(연속된 청크 사이에 겹치는 문자 수)을 지정합니다.

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/13176/pdf")
data = loader.load()
# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Add data to the vector store
vector_store.add_documents(docs)

이 코드는 실행 데 몇 분 정도 걸릴 수 있습니다. 작업이 완료된 후 Atlas 사용하는 경우 langchain_db.local_rag Atlas UI 의 네임스페이스 로 이동하여 벡터 임베딩을 확인할 수 있습니다.

mongosh 에서 로컬 배포서버에 연결하거나 배포의 연결 문자열을 사용하여 애플리케이션에 연결하여 벡터 임베딩을 볼 수도 있습니다. 그런 다음 컬렉션 에서 읽기 작업을 실행 수 langchain_db.local_rag 있습니다.

MongoDB Vector Search 인덱스 생성합니다.

벡터 저장 에서 벡터 검색 쿼리를 활성화 하려면 langchain_db.test 컬렉션 에 MongoDB Vector Search 인덱스 만듭니다. LangChain 헬퍼 메서드를 사용하여 인덱스 만들 수 있습니다.

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 1024 # The dimensions of the vector embeddings to be indexed
)

팁

create_vector_search_index API 참조

인덱스 작성에는 약 1분 정도가 소요됩니다. 인덱스가 작성되는 동안 인덱스는 초기 동기화 상태에 있습니다. 빌드가 완료되면 컬렉션의 데이터 쿼리를 시작할 수 있습니다.

로컬 LLM을 사용하여 질문에 답변하기

이 섹션에서는 MongoDB Vector Search 및 GPT4All을 사용하여 로컬에서 실행 수 있는 샘플 RAG 구현 보여 줍니다.

LangChain을 사용하여 로컬에서 LLM을 실행 다른 방법을 학습 로컬에서 모델 실행을 참조하세요.

로컬 LLM을 로드합니다.

다음 버튼을 클릭하여 GPT4All에서 Mistral 7B 모델을 다운로드하세요. 다른 모델을 살펴보려면 GPT4All 웹사이트를 참조하세요.
다운로드
이 모델을 local-rag-mongodb 프로젝트 디렉토리로 이동합니다.

다음 코드를 노트북에 붙여넣어 LLM을 구성합니다. 실행 전에 <path-to-model> 를 LLM을 로컬에 저장한 경로로 바꿉니다.

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import GPT4All
# Configure the LLM
local_path = "<path-to-model>"
# Callbacks support token-wise streaming
callbacks = [StreamingStdOutCallbackHandler()]
# Verbose is required to pass to the callback manager
llm = GPT4All(model=local_path, callbacks=callbacks, verbose=True)

데이터에 대한 질문에 답하세요.

다음 코드를 실행하여 RAG 구현을 완료합니다.

from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# Instantiate MongoDB Vector Search as a retriever
retriever = vector_store.as_retriever()
# Define prompt template
template = """
Use the following pieces of context to answer the question at the end.
{context}
Question: {question}
"""
custom_rag_prompt = PromptTemplate.from_template(template)
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)
# Create chain
rag_chain = (
   {"context": retriever | format_docs, "question": RunnablePassthrough()}
   | custom_rag_prompt
   | llm
   | StrOutputParser()
)
# Prompt the chain
question = "What was MongoDB's latest acquisition?"
answer = rag_chain.invoke(question)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
pprint.pprint(documents)

Answer: MongoDB's latest acquisition was Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation
Source documents:
[Document(id='680a98187685ddb66d29ed88', metadata={'_id': '680a98187685ddb66d29ed88', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 1, 'page_label': '2'}, page_content='Measures."\nFourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB  acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation'),
 Document(id='680a98187685ddb66d29ed8c', metadata={'_id': '680a98187685ddb66d29ed8c', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 1, 'page_label': '2'}, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
 Document(id='680a98187685ddb66d29ee3f', metadata={'_id': '680a98187685ddb66d29ee3f', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 8, 'page_label': '9'}, page_content='View original content to download multimedia:https://www.prnewswire.com/news-releases/mongodb-inc-announces-fourth-quarter-and-full-\nyear-fiscal-2025-financial-results-302393702.html'),
 Document(id='680a98187685ddb66d29edde', metadata={'_id': '680a98187685ddb66d29edde', 'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 3, 'page_label': '4'}, page_content='distributed database on the market. With integrated capabilities for operational data, search, real-time analytics, and AI-powered retrieval, MongoDB')]

돌아가기

자체 쿼리 검색

GraphRAG