/ /

MongoDBと LgDB でハイブリッド検索の実行

MongoDBと LgChuin を統合して、ハイブリッド検索を実行できます。このチュートリアルでは、次の手順を実行します。

環境を設定します。
MongoDB をベクトルストアとして使用します。
データにMongoDB ベクトル検索とMongoDB Searchインデックスを作成します。
ハイブリッド検索クエリを実行します。
クエリ結果を RAGパイプラインに渡します。

このチュートリアルの実行可能なバージョンを Python エディタとして操作します。

前提条件

Atlas のサンプルデータセットからの映画データを含むコレクションを使用します。

次のいずれかのMongoDBクラスタータイプ
- MongoDB バージョン 6.0.11、7.0.2、またはそれ以降を実行している Atlas クラスター。IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認する。
- Atlas CLI を使用して作成されたローカル Atlas 配置。詳細については、「Atlas 配置のローカル配置の作成」を参照してください。
- Search とベクトル検索がインストールされたMongoDB Community または Enterprise クラスター。
投票AI APIキー。アカウントとAPIキーを作成するには、Vyage AI のウェブサイトを参照してください。
OpenAI APIキー。APIリクエストに使用できるクレジットを持つ OpenAI アカウントが必要です。OpenAI アカウントの登録の詳細については、OpenAI APIウェブサイトを参照してください。
Colab などのインタラクティブPythonノートを実行するための環境。

注意

互換性のあるPythonバージョンを使用していることを確認するには、 langgroup-voiceail パッケージの要件を確認してください。

環境を設定する

このチュートリアルの環境を設定します。 .ipynb 拡張子を持つファイルを保存して、インタラクティブPythonノートを作成します。このノートはPythonコードスニペットを個別に実行でき、このチュートリアルのコードを実行するために使用します。

ノートク環境を設定するには、次の手順に従います。

依存関係をインストールしてインポートします。

ノートブックで次のコマンドを実行します。

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-voyageai langchain-openai pymongo pypdf

環境変数を設定してください。

このチュートリアルの環境変数を設定するには、次のコードを実行します。APIキーとMongoDBクラスターの接続文字列を指定します。

import os
os.environ["VOYAGE_API_KEY"] = "<voyage-api-key>"
os.environ["OPENAI_API_KEY"] = "<openai-api-key>"
MONGODB_URI = "<connection-string>"

注意

<connection-string> を Atlas クラスターまたはローカル Atlas 配置の接続文字列に置き換えます。

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

詳しくは、ドライバーを使用してクラスターに接続するを参照してください。

接続stringには、次の形式を使用する必要があります。

mongodb://localhost:<port-number>/?directConnection=true

詳細については、「接続文字列」を参照してください。

MongoDB をベクトルストアとして使用

データのベクトルストアとしてMongoDB を使用する必要があります。MongoDBでは既存のコレクションを使用してベクトルストアをインスタンス化できます。

サンプルデータをロードします。

まだ行っていない場合は、手順を完了してサンプルデータをクラスターにロードします。

注意

独自のデータを使用する場合、「Lgachein を使い始める」または「ベクトル埋め込みの作成方法」をご覧いただき、Atlas にベクトル埋め込みを取り込む方法を確認してください。

ベクトルストアをインスタンス化します。

次のコードをノート PC に貼り付けて実行し、Atlas の sample_mflix.embedded_movies名前空間から vector_store という名前のベクトルストアインスタンスを作成します。このコードでは、from_connection_string メソッドを使用して MongoDBAtlasVectorSearchベクトルストアを作成し、次のパラメータを指定します。

MongoDBクラスターの接続文字列。
テキストをベクトル埋め込みに変換するための Vyage AIの voyage-3-large 埋め込みモデル。
sample_mflix.embedded movies 使用する名前空間として。
plot : テキストを含むフィールド。
plot_embedding_voyage_3_large 埋め込みを含むフィールドとして。
dotProduct 関連性スコア関数として。

from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_voyageai import VoyageAIEmbeddings
# Create the vector store
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string = MONGODB_URI,
   embedding = VoyageAIEmbeddings(model = "voyage-3-large", output_dimension = 2048),
   namespace = "sample_mflix.embedded_movies",
   text_key = "plot",
   embedding_key = "plot_embedding_voyage_3_large",
   relevance_score_fn = "dotProduct"
)

Tip

MongoDBAtlasVectorSearch APIリファレンス

インデックスの作成

ベクトルストアでハイブリッド検索クエリを有効にするには、コレクションにMongoDB ベクトル検索とMongoDB Searchインデックスを作成します。インデックスは、LgChuinヘルパーメソッドまたはPyMongoドライバーメソッドのいずれかを使用して作成できます。

MongoDB ベクトル検索インデックスを作成します。

次のコードを実行して、コレクションのplot_embedding_voyage_3_large フィールドにインデックスを付けるベクトル検索インデックスを作成します。

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 2048 # The dimensions of the vector embeddings to be indexed
)

Tip

create_vector_search_index APIリファレンス

MongoDB Searchインデックスを作成します。

plotノート次のコードを実行して、コレクションのフィールドにインデックスを付けるための検索インデックスを作成します。

from langchain_mongodb.index import create_fulltext_search_index
from pymongo import MongoClient
# Connect to your cluster
client = MongoClient(MONGODB_URI)
# Use helper method to create the search index
create_fulltext_search_index(
   collection = client["sample_mflix"]["embedded_movies"],
   field = "plot",
   index_name = "search_index"
)

Tip

create_ fulltext_search_index APIリファレンス

MongoDB ベクトル検索インデックスを作成します。

次のコードを実行して、コレクションのplot_embedding_voyage_3_large フィールドにインデックスを付けるベクトル検索インデックスを作成します。

from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
# Connect to your cluster
client = MongoClient(MONGODB_URI)
collection = client["sample_mflix"]["embedded_movies"]
# Create your vector search index model, then create the index
vector_index_model = SearchIndexModel(
   definition={
      "fields": [
         {
         "type": "vector",
         "path": "plot_embedding_voyage_3_large",
         "numDimensions": 2048,
         "similarity": "dotProduct"
         }
      ]
   },
   name="vector_index",
   type="vectorSearch"
)
collection.create_search_index(model=vector_index_model)

MongoDB Searchインデックスを作成します。

次のコードを実行して、コレクションのplot フィールドにインデックスを付けるための検索インデックスを作成します。

1 # Create your search index model, then create the search index
2 search_index_model = SearchIndexModel(
3    definition={
4       "mappings": {
5             "dynamic": False,
6             "fields": {
7                "plot": {
8                   "type": "string"
9                }
10             }
11       }
12    },
13    name="search_index"
14 )
15 collection.create_search_index(model=search_index_model)

インデックスの構築には 1 分ほどかかります。ビルドする間、インデックスは最初の同期状態にあります。作成が完了したら、コレクション内のデータのクエリを開始できます。

ハイブリッド検索クエリの実行

MongoDBによってインデックスが構築されたら、データに対してハイブリッド検索クエリを実行できます。次のコードでは、MongoDBAtlasHybridSearchRetriever リトリバーを使用して string "time travel" のハイブリッド検索を実行します。また、次のパラメータも指定します。

vectorstore:ベクトルストアインスタンスの名前。
search_index_name: MongoDB Searchインデックスの名前。
top_k: 返されるドキュメントの数。
fulltext_penalty: 全文検索のペナルティ。
ペナルティが低いほど、全文検索スコアが高くなります。
vector_penalty:ベクトル検索のペナルティ。
ペナルティが低いほどベクトル検索スコアが高くなります。

レプリカは、全文検索スコアとベクトル検索スコアの合計でソートされたドキュメントのリストを返します。コード例の最終出力には、タイトル、プロット、および各ドキュメントの異なるスコアが含まれます。

ハイブリッド検索クエリー結果の詳細については、「クエリについて」を参照してください。

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
    vectorstore = vector_store,
    search_index_name = "search_index",
    top_k = 5,
    fulltext_penalty = 50,
    vector_penalty = 50,
    post_filter=[
        {
            "$project": {
                "plot_embedding": 0,
                "plot_embedding_voyage_3_large": 0
            }
        }
    ])
# Define your query
query = "time travel"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print("Title: " + doc.metadata["title"])
   print("Plot: " + doc.page_content)
   print("Search score: {}".format(doc.metadata["fulltext_score"]))
   print("Vector Search score: {}".format(doc.metadata["vector_score"]))
   print("Total score: {}\n".format(doc.metadata["fulltext_score"] + doc.metadata["vector_score"]))

Title: Timecop
Plot: An officer for a security agency that regulates time travel, must fend for his life against a shady politician who has a tie to his past.
Search score: 0.019230769230769232
Vector Search score: 0.018518518518518517
Total score: 0.03774928774928775
Title: A.P.E.X.
Plot: A time-travel experiment in which a robot probe is sent from the year 2073 to the year 1973 goes terribly wrong thrusting one of the project scientists, a man named Nicholas Sinclair into a...
Search score: 0.018518518518518517
Vector Search score: 0.018867924528301886
Total score: 0.0373864430468204
Title: About Time
Plot: At the age of 21, Tim discovers he can travel in time and change what happens and has happened in his own life. His decision to make his world a better place by getting a girlfriend turns out not to be as easy as you might think.
Search score: 0
Vector Search score: 0.0196078431372549
Total score: 0.0196078431372549
Title: The Time Traveler's Wife
Plot: A romantic drama about a Chicago librarian with a gene that causes him to involuntarily time travel, and the complications it creates for his marriage.
Search score: 0.0196078431372549
Vector Search score: 0
Total score: 0.0196078431372549
Title: Retroactive
Plot: A psychiatrist makes multiple trips through time to save a woman that was murdered by her brutal husband.
Search score: 0
Vector Search score: 0.019230769230769232
Total score: 0.019230769230769232

Tip

MongoDBAtlasHybridSearchRetried APIリファレンス

結果を RG パイプラインに渡す

ハイブリッド検索結果を CRGパイプラインに渡して、検索されたドキュメントに対して応答を生成できます。サンプルコードでは、次の処理が行われます。

検索されたドキュメントをクエリのコンテキストとして使用するように LM に指示するための Lgachein プロンプトテンプレートを定義します。LgChart はこれらのドキュメントを {context} 入力変数に渡し、クエリを {query} 変数に渡します。
次の内容を指定するチェーンを構築します。
- 関連するドキュメントを検索するために定義したハイブリッド検索リドライバー。
- 定義したプロンプトテンプレート。
- コンテキストを認識する応答を生成するための OpenAI による LM。デフォルトでは、これは gpt-3.5-turbo モデルです。
サンプルクエリでチェーンをプロンプし、応答を返します。生成される応答は異なる場合があります。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import  RunnablePassthrough
from langchain_openai import ChatOpenAI
# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: Can you recommend some movies about {query}?
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI()
# Construct a chain to answer questions on your data
chain = (
   {"context": retriever, "query": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)
# Prompt the chain
query = "time travel"
answer = chain.invoke(query)
print(answer)

Certainly! Here are some movies about time travel from the context provided:
1. **Timecop (1994)**
   Genre: Action, Crime, Sci-Fi
   Plot: A law enforcement officer working for the Time Enforcement Commission battles a shady politician with a personal tie to his past.
   IMDb Rating: 5.8
2. **A.P.E.X. (1994)**
   Genre: Action, Sci-Fi
   Plot: A time-travel experiment gone wrong thrusts a scientist into an alternate timeline plagued by killer robots.
   IMDb Rating: 4.3
3. **About Time (2013)**
   Genre: Drama, Fantasy, Romance
   Plot: A young man discovers he can time travel and uses this ability to improve his life, especially his love life, but learns the limitations and challenges of his gift.
   IMDb Rating: 7.8
4. **The Time Traveler's Wife (2009)**
   Genre: Drama, Fantasy, Romance
   Plot: A Chicago librarian with a gene causing him to involuntarily time travel struggles with its impact on his romantic relationship and marriage.
   IMDb Rating: 7.1
5. **Retroactive (1997)**
   Genre: Action, Crime, Drama
   Plot: A woman accidentally time-travels to prevent a violent event, but her attempts to fix the situation lead to worsening consequences due to repeated time cycles.
   IMDb Rating: 6.3
Each movie covers time travel with unique perspectives, from action-packed adventures to romantic dramas.

戻る

メモリとセマンティックキャッシュ

Parent Document Retrieval

1	# Create your search index model, then create the search index
2	search_index_model = SearchIndexModel(
3	definition={
4	"mappings": {
5	"dynamic": False,
6	"fields": {
7	"plot": {
8	"type": "string"
9	}
10	}
11	}
12	},
13	name="search_index"
14	)
15	collection.create_search_index(model=search_index_model)

前提条件

注意

環境を設定する

依存関係をインストールしてインポートします。

環境変数を設定してください。

注意

MongoDB をベクトル ストアとして使用

サンプル データをロードします。

注意

ベクトル ストアをインスタンス化します。

Tip

インデックスの作成

MongoDB ベクトル検索インデックスを作成します。

Tip

MongoDB Searchインデックスを作成します。

Tip

MongoDB ベクトル検索インデックスを作成します。

MongoDB Searchインデックスを作成します。

ハイブリッド検索クエリの実行

Tip

結果を RG パイプラインに渡す

MongoDB をベクトルストアとして使用

サンプルデータをロードします。

ベクトルストアをインスタンス化します。