/ /

MongoDB 및 LangChain으로 하이브리드 검색 수행

MongoDB를 LangChain과 통합하여 하이브리드 검색을 수행할 수 있습니다. 이 튜토리얼에서는 다음 단계를 완료합니다.

환경을 설정합니다.
MongoDB 벡터 저장 로 사용합니다.
데이터에 MongoDB Vector Search 및 MongoDB Search 인덱스 생성합니다.
하이브리드 검색 쿼리를 실행합니다.
쿼리 결과를 RAG 파이프라인 에 전달합니다.

이 튜토리얼의 실행 가능한 버전을 Python 노트북으로 사용합니다.

전제 조건

이 튜토리얼을 완료하려면 다음 조건을 충족해야 합니다.

다음 MongoDB cluster 유형 중 하나입니다.
- MongoDB 6.0.11 버전, 이상을 실행 Atlas cluster7.0.2. 사용자의 IP 주소 가 Atlas 프로젝트의 액세스 목록에 포함되어 있는지 확인하세요.
- Atlas CLI 사용하여 생성된 로컬 Atlas 배포서버 입니다. 자세히 학습 로컬 Atlas 배포 만들기를 참조하세요.
- 검색 및 벡터 검색이 설치된 MongoDB Community 또는 Enterprise 클러스터.
Voyage AI API 키입니다. API 키를 만들려면 모델 API 키를 참조하세요.
OpenAI API 키입니다. API 요청에 사용할 수 있는 크레딧이 있는 OpenAI 계정이 있어야 합니다. OpenAI 계정 등록에 대해 자세히 학습하려면 OpenAI API 웹사이트를 참조하세요.
Colab과같은 대화형 Python 노트북을 실행 수 있는 환경입니다.

참고

langchain-voyageai 패키지의 요구 사항을 확인하여 호환되는 Python 버전을 사용하고 있는지 확인하세요.

환경 설정

이 튜토리얼의 환경을 설정합니다. 확장자가 .ipynb 인 파일 저장하여 대화형 Python 노트북을 만듭니다. 이 노트북을 사용하면 Python 코드 스니펫을 개별적으로 실행 수 있으며, 이 튜토리얼에서는 이를 사용하여 코드를 실행 .

노트북 환경을 설정하다 하려면 다음을 수행합니다.

종속성을 설치하고 가져옵니다.

노트북에서 다음 명령을 실행합니다.

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-voyageai langchain-openai pymongo pypdf

환경 변수를 설정합니다.

다음 코드를 실행하여 이 튜토리얼의 환경 변수를 설정하다 . API 키와 MongoDB 클러스터의 연결 문자열 제공합니다.

import os
os.environ["VOYAGE_API_KEY"] = "<voyage-api-key>"
os.environ["OPENAI_API_KEY"] = "<openai-api-key>"
MONGODB_URI = "<connection-string>"

참고

<connection-string>을 Atlas 클러스터 또는 로컬 Atlas 배포서버의 연결 문자열로 교체합니다.

연결 문자열은 다음 형식을 사용해야 합니다.

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

자세한 학습은 클라이언트 라이브러리를 통해 클러스터에 연결을 참조하세요.

연결 문자열은 다음 형식을 사용해야 합니다.

mongodb://localhost:<port-number>/?directConnection=true

학습 내용은 연결 문자열을 참조하세요.

MongoDB 벡터 저장소로 사용

데이터의 벡터 저장 로 MongoDB 사용해야 합니다. MongoDB 의 기존 컬렉션 사용하여 벡터 저장 인스턴스화할 수 있습니다.

샘플 데이터를 불러옵니다.

아직 수행하지 않았다면 샘플 데이터를 클러스터에 로드하는단계를 완료합니다.

참고

자체 데이터를 사용하려면 LangChain 시작하기 또는 How to Create vector embeddings Manually 을 참조하여 vector embeddings을 Atlas에 수집하는 방법을 학습하세요.

벡터 저장소를 인스턴스화합니다.

노트북에 다음 코드를 붙여넣고 실행 Atlas 의 sample_mflix.embedded_movies 네임스페이스 에서 vector_store 이라는 벡터 저장 인스턴스 만듭니다. 이 코드는 from_connection_string 메서드를 사용하여 MongoDBAtlasVectorSearch 벡터 저장 만들고 다음 매개 변수를 지정합니다.

MongoDB 클러스터의 연결 문자열.
텍스트를 벡터 임베딩으로 변환하는 Voyage AI 의 voyage-3-large 임베딩 모델입니다.
sample_mflix.embedded movies 네임스페이스 로 지정합니다.
plot 를 텍스트가 포함된 필드 로 지정합니다.
plot_embedding_voyage_3_large 를 임베딩이 포함된 필드 로 지정합니다.
dotProduct 를 관련성 점수 함수로 사용합니다.

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_voyageai import VoyageAIEmbeddings
# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = MONGODB_URI,
   embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048),
   namespace = "sample_mflix.embedded_movies",
   text_key = "plot",
   embedding_key = "plot_embedding_voyage_3_large",
   relevance_score_fn = "dotProduct"
)

팁

MongoDBAtlasVectorSearch API 참조

인덱스 만들기

벡터 저장 에서 하이브리드 검색 쿼리를 활성화 하려면 컬렉션 에 MongoDB Vector Search 및 MongoDB Search 인덱스 만듭니다. LangChain 헬퍼 메서드 또는 PyMongo 드라이버 메서드를 사용하여 인덱스를 생성할 수 있습니다.

MongoDB Vector Search 인덱스 생성합니다.

다음 코드를 실행하여 컬렉션의 plot_embedding_voyage_3_large 필드를 인덱싱하는 벡터 검색 인덱스 만듭니다.

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 2048 # The dimensions of the vector embeddings to be indexed
)

팁

create_vector_search_index API 참조

MongoDB Search 인덱스 생성합니다.

노트북에서 다음 코드를 실행하여 컬렉션의 plot 필드를 인덱싱하는 검색 인덱스 를 만듭니다.

from langchain_mongodb.index import create_fulltext_search_index
from pymongo import MongoClient
# Connect to your cluster
client = MongoClient(MONGODB_URI)
# Use helper method to create the search index
create_fulltext_search_index(
   collection = client["sample_mflix"]["embedded_movies"],
   field = "plot",
   index_name = "search_index"
)

팁

create_fulltext_search_index API 참조

MongoDB Vector Search 인덱스 생성합니다.

다음 코드를 실행하여 컬렉션의 plot_embedding_voyage_3_large 필드를 인덱싱하는 벡터 검색 인덱스 만듭니다.

from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
# Connect to your cluster
client = MongoClient(MONGODB_URI)
collection = client["sample_mflix"]["embedded_movies"]
# Create your vector search index model, then create the index
vector_index_model = SearchIndexModel(
   definition={
      "fields": [
         {
         "type": "vector",
         "path": "plot_embedding_voyage_3_large",
         "numDimensions": 2048,
         "similarity": "dotProduct"
         }
      ]
   },
   name="vector_index",
   type="vectorSearch"
)
collection.create_search_index(model=vector_index_model)

MongoDB Search 인덱스 생성합니다.

다음 코드를 실행하여 컬렉션의 plot 필드를 인덱싱하는 검색 인덱스 를 만듭니다.

1 # Create your search index model, then create the search index
2 search_index_model = SearchIndexModel(
3    definition={
4       "mappings": {
5             "dynamic": False,
6             "fields": {
7                "plot": {
8                   "type": "string"
9                }
10             }
11       }
12    },
13    name="search_index"
14 )
15 collection.create_search_index(model=search_index_model)

인덱스 구축에는 약 1분 정도 소요됩니다. 구축 중에는 인덱스가 초기 동기화 상태에 있습니다. 구축이 완료되면 컬렉션의 데이터를 쿼리할 수 있습니다.

하이브리드 검색 쿼리 실행

MongoDB 인덱스를 빌드하면 데이터에 대해 하이브리드 검색 쿼리를 실행 수 있습니다. 다음 코드는 MongoDBAtlasHybridSearchRetriever 리트리버 를 사용하여 문자열 "time travel"에 대한 하이브리드 검색 수행합니다. 또한 다음 매개변수를 지정합니다.

vectorstore: 벡터 저장 인스턴스 의 이름입니다.
search_index_name: MongoDB Search 인덱스 의 이름입니다.
top_k: 반환할 문서 수입니다.
fulltext_penalty: 전체 텍스트 검색 에 대한 페널티입니다.

패널티가 낮을수록 전체 텍스트 검색 점수가 높아집니다.
vector_penalty: 벡터 검색 에 대한 페널티입니다.

페널티가 낮을수록 벡터 검색 점수가 높아집니다.

리트리버는 전체 텍스트 검색 점수와 벡터 검색 점수의 합계를 기준으로 정렬된 문서 목록을 반환합니다. 코드 예시 의 최종 출력에는 제목, 줄거리 및 각 문서 의 다양한 점수가 포함됩니다.

하이브리드 검색 쿼리 결과에 대해 자세히 학습 쿼리 정보를 참조하세요.

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
    vectorstore = vector_store,
    search_index_name = "search_index",
    top_k = 5,
    fulltext_penalty = 50,
    vector_penalty = 50,
    post_filter=[
        {
            "$project": {
                "plot_embedding": 0,
                "plot_embedding_voyage_3_large": 0
            }
        }
    ])
# Define your query
query = "time travel"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print("Title: " + doc.metadata["title"])
   print("Plot: " + doc.page_content)
   print("Search score: {}".format(doc.metadata["fulltext_score"]))
   print("Vector Search score: {}".format(doc.metadata["vector_score"]))
   print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))

Title: Timecop
Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past.
Search score: 0.019230769230769232
Vector Search score: 0.018518518518518517
Total score: 0.03774928774928775
Title: A.P.E.X.
Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of the project scientists, a man named Nicholas Sinclair into a...
Search score: 0.018518518518518517
Vector Search score: 0.018867924528301886
Total score: 0.0373864430468204
Title: About Time
Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think.
Search score: 0
Vector Search score: 0.0196078431372549
Total score: 0.0196078431372549
Title: The Time Traveler's Wife
Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage.
Search score: 0.0196078431372549
Vector Search score: 0
Total score: 0.0196078431372549
Title: Retroactive
Plot: A psychiatrist makes multiple trips through time to save a woman that was murdered by her brutal husband.
Search score: 0
Vector Search score: 0.019230769230769232
Total score: 0.019230769230769232

팁

MongoDBAtlasHybridSearchRetriever API 참조

결과를 RAG 파이프라인에 전달

하이브리드 검색 결과를 RAG 파이프라인 에 전달하여 조회된 문서에 대한 응답을 생성할 수 있습니다. 샘플 코드는 다음을 수행합니다.

검색된 문서를 쿼리의 컨텍스트로 사용하도록 LLM에 지시하는 LangChain 프롬프트 템플릿 을 정의합니다. LangChain은 이러한 문서를 {context} 입력 변수에 전달하고 쿼리 {query} 변수에 전달합니다.
다음을 지정하는 체인 을 생성합니다.
- 관련 문서를 조회 위해 정의한 하이브리드 검색 리트리버입니다.
- 사용자가 정의한 프롬프트 템플릿입니다.
- 컨텍스트 인식 응답을 생성하는 OpenAI의 LLM입니다. 기본값 으로 이 모델은 gpt-3.5-turbo 모델입니다.
샘플 쿼리 로 체인에 프롬프트를 표시하고 응답을 반환합니다. 생성된 응답은 다를 수 있습니다.

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import  RunnablePassthrough
from langchain_openai import ChatOpenAI
# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: Can you recommend some movies about {query}?
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
   {"context": retriever, "query": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)
# Prompt the chain
query = "time travel"
answer = chain.invoke(query)
print(answer)

Certainly! Here are some movies about time travel from the context provided:
1. **Timecop (1994)**
   Genre: Action, Crime, Sci-Fi
   Plot: A law enforcement officer working for the Time Enforcement Commission battles a shady politician with a personal tie to his past.
   IMDb Rating: 5.8
2. **A.P.E.X. (1994)**
   Genre: Action, Sci-Fi
   Plot: A time-travel experiment gone wrong thrusts a scientist into an alternate timeline plagued by killer robots.
   IMDb Rating: 4.3
3. **About Time (2013)**
   Genre: Drama, Fantasy, Romance
   Plot: A young man discovers he can time travel and uses this ability to improve his life, especially his love life, but learns the limitations and challenges of his gift.
   IMDb Rating: 7.8
4. **The Time Traveler's Wife (2009)**
   Genre: Drama, Fantasy, Romance
   Plot: A Chicago librarian with a gene causing him to involuntarily time travel struggles with its impact on his romantic relationship and marriage.
   IMDb Rating: 7.1
5. **Retroactive (1997)**
   Genre: Action, Crime, Drama
   Plot: A woman accidentally time-travels to prevent a violent event, but her attempts to fix the situation lead to worsening consequences due to repeated time cycles.
   IMDb Rating: 6.3
Each movie covers time travel with unique perspectives, from action-packed adventures to romantic dramas.

돌아가기

메모리 및 시맨틱 캐싱

Parent Document Retrieval

1	# Create your search index model, then create the search index
2	search_index_model = SearchIndexModel(
3	definition={
4	"mappings": {
5	"dynamic": False,
6	"fields": {
7	"plot": {
8	"type": "string"
9	}
10	}
11	}
12	},
13	name="search_index"
14	)
15	collection.create_search_index(model=search_index_model)

MongoDB 및 LangChain으로 하이브리드 검색 수행

전제 조건

참고

환경 설정

종속성을 설치하고 가져옵니다.

환경 변수를 설정합니다.

참고

MongoDB 벡터 저장소로 사용

샘플 데이터를 불러옵니다.

참고

벡터 저장소를 인스턴스화합니다.

팁

인덱스 만들기

MongoDB Vector Search 인덱스 생성합니다.

팁

MongoDB Search 인덱스 생성합니다.

팁

MongoDB Vector Search 인덱스 생성합니다.

MongoDB Search 인덱스 생성합니다.

하이브리드 검색 쿼리 실행

팁

결과를 RAG 파이프라인에 전달

스킬 배지 획득