正規表現クエリの代わりにMongoDB Search を使用する

クエリが正規表現の一致に依存する場合は、 MongoDB Searchインデックスを作成し、 $search集計パイプラインステージを実行中ことでクエリのパフォーマンスと効率を向上させることができます。は常にインデックスを使用できないため非効率的ですが、$regex MongoDB Search インデックスはクエリのパフォーマンスを大幅に向上させ、クエリパラメータをカスタマイズするオプションを増やします。

このページでは、の一般的なユースケースにおけるMongoDB Search$regex のインデックスとクエリ構成について説明します。

例

例ではsample_mflix.movies 名前空間を使用します。サンプルクエリを実行するには、このコレクションをクラスターに追加するか、 MongoDB Search Playground で事前構成されたスナップショットを使用します。サンプルクエリは、次のユースケースで$search $regexではなくを使用する方法を示しています。

プレフィックスクエリ

アプリケーションが1 セットの文字またはプレフィックスで始まる string$regex ^値を頻繁にクエリする場合は、string 値の先頭から検索するオプションと、次のケースを作成するi を使用することがあります。区別されない。

代わりに、集計パイプラインステージを使用するMongoDB Search $searchクエリを推奨します。次のクエリでは、プレフィックスback で始まる映画タイトルを検索します。

➤ MongoDB Search Playground でこれを試してみてください。

$regex クエリ

$search クエリ

db.movies.find( { title: { $regex: /^back/i } }, { title: 1, _id: 0 } )  // Query 1
db.movies.find( { title: { $regex: "^back", $options: "i" } }, { title: 1, _id: 0 } )  // Query 2

[
  { title: 'Back to the Future' },
  { title: 'Back to School' },
  { title: 'Back to the Future Part II' },
  { title: 'Back to the USSR - takaisin Ryssiin' },
  { title: 'Back to the Future Part III' },
  { title: 'Backdraft' },
  { title: 'Backbeat' },
  { title: 'Backstage' },
  { title: 'Backdoor' },
  { title: 'Backstage' },
  { title: 'Back Soon' },
  { title: 'Backlight' },
  { title: 'Back to Stay' },
  { title: 'Back Issues: The Hustler Magazine Story' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "text": {
        "query": "back",
        "path": "title",
        "matchCriteria": "all"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'Backdraft', score: 3.8287878036499023 },
  { title: 'Backbeat', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'Backdoor', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Way Back', score: 3.8287878036499023 },
  { title: '3 Backyards', score: 3.8287878036499023 },
  { title: 'Backlight', score: 3.8287878036499023 },
  { title: 'The Way Way Back', score: 3.8287878036499023 },
  { title: 'Back to the Future', score: 3.455096483230591 },
  { title: 'Back to School', score: 3.455096483230591 },
  { title: 'The Cat Came Back', score: 3.455096483230591 },
  { title: "Jack's Back", score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'T-Rex: Back to the Cretaceous', score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'No Turning Back', score: 3.455096483230591 },
  { title: "The Devil's Backbone", score: 3.455096483230591 }
]
Type "it" for more

この$search クエリを実行するには、次のようなMongoDB Searchインデックスを作成します。

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string",
          "analyzer": "autocomplete-search",
          "searchAnalyzer": "lucene.standard"
        }
      ]
    }
  },
  "analyzers": [
    {
      "name": "autocomplete-search",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 10
        }
      ]
    }
  ]
}

このインデックス定義は、インデックス付きフィールドに autocomplete-searchカスタムアナライザを使用し、クエリに lucene.standardアナライザを使用する string タイプとして、moviesコレクション内の titleフィールドをインデックス化します。インデックスフィールド用の analyzer として autocomplete-search と名前付けされ、クエリ用の searchAnalyzer として lucene.standard という名前のカスタムアナライザ。という名前のカスタムアナライザ

lowercase トークンフィルター : すべての文字を小文字に変換し、大文字と小文字を区別しないクエリをサポートします
edgeGram トークンフィルターを使用して、長さが 4 文字と 10 文字の間のトークンを作成します

注意

このカスタムアナライザは、長さが最大 10 文字までの単語のみをサポートします。 10 文字を超える単語とクエリを予想する場合は、maxGram の値を増やしてください。 maxGram の値を 15 より大きく設定することはお勧めしません。値が大きいとインデックスのサイズが大きくなり、パフォーマンスと可用性に影響可能性があるためです。

部分文字列「含む」クエリ

アプリケーションがフィールドのどこにでも存在する文字列を頻繁にクエリする場合は、$regex クエリを実行することがあります。このクエリではすべてのドキュメントがチェックされ、すべての一致が特定の順序で返されません。

代わりに、集計パイプラインステージを使用するMongoDB Search $searchクエリを推奨します。次のクエリは、park titleフィールドのどこにでもタームが含まれる映画タイトルを検索します。

➤ MongoDB Search Playground でこれを試してみてください。

$regex クエリ

$search クエリ

db.movies.find({ title: { $regex: "park", $options: "i" } }, { title: 1, _id: 0 })

[
  { title: 'Barefoot in the Park' },
  { title: 'The Panic in Needle Park' },
  { title: 'Gorky Park' },
  { title: 'The Park Is Mine' },
  { title: 'Jurassic Park' },
  { title: 'Mrs. Parker and the Vicious Circle' },
  { title: 'The Lost World: Jurassic Park' },
  { title: 'Dog Park' },
  { title: 'South Park: Bigger Longer & Uncut' },
  { title: 'Jurassic Park III' },
  { title: 'Mansfield Park' },
  { title: 'Jurassic Park III' },
  { title: 'Gosford Park' },
  { title: 'The Rosa Parks Story' },
  { title: 'The Delicate Art of Parking' },
  { title: 'Wicker Park' },
  { title: 'Chestnut: Hero of Central Park' },
  { title: 'Trailer Park Boys: The Movie' },
  { title: 'Ellie Parker' },
  { title: 'Paranoid Park' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "wildcard": {
        "query": "park*",
        "path": "title",
        "allowAnalyzedField": true
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { "$meta": "searchScore" }
    }
  }
])

[
  { title: 'Barefoot in the Park', score: 1 },
  { title: 'The Panic in Needle Park', score: 1 },
  { title: 'Gorky Park', score: 1 },
  { title: 'The Park Is Mine', score: 1 },
  { title: 'Jurassic Park', score: 1 },
  { title: 'Mrs. Parker and the Vicious Circle', score: 1 },
  { title: 'The Lost World: Jurassic Park', score: 1 },
  { title: 'Dog Park', score: 1 },
  { title: 'South Park: Bigger Longer & Uncut', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Mansfield Park', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Gosford Park', score: 1 },
  { title: 'The Rosa Parks Story', score: 1 },
  { title: 'Wicker Park', score: 1 },
  { title: 'The Delicate Art of Parking', score: 1 },
  { title: 'Chestnut: Hero of Central Park', score: 1 },
  { title: 'Trailer Park Boys: The Movie', score: 1 },
  { title: 'Ellie Parker', score: 1 },
  { title: 'Paranoid Park', score: 1 }
]
Type "it" for more

この$search クエリを実行するには、次の定義でMongoDB Searchインデックスを作成します。

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "contains",
        "searchAnalyzer": "lucene.standard"
      }
    }
  },
  "analyzers": [
    {
      "name": "contains",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "reverse"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 15
        },
        {
          "type": "reverse"
        }
      ]
    }
  ]
}

このインデックス定義は、次を適用する contains という名前のカスタムアナライザを使用して、moviesコレクション内の titleフィールドを string 型としてインデックス化します。

standard トークナイザを使用して、単語を空白または句読点で分裂。
lowercase トークンフィルターを使用して、文字を小文字に変換し、大文字と小文字を区別しないクエリをサポートします。
reverse トークンフィルター（2 回）を使用して単語を逆にして、効率的な非アクティブなクエリをサポートします。
edgeGram トークンフィルターを使用して、4 文字から 15 文字の長さのトークンを作成します。

注意

このカスタムアナライザは、長さが最大 15 文字までの単語のみをサポートします。 15 文字を超える単語がある場合は、maxGram の値を増やしてください。 maxGram の値を 15 より大きく設定することは推奨されません。値が大きいとインデックスのサイズが大きくなり、パフォーマンスと可用性に影響可能性があるためです。

接尾辞のクエリ

アプリケーションが1 セットの文字またはサフィックスで終わる string $フィールド値を頻繁にクエリする場合は、string 値の末尾を検索する $regex オプションとオプションi を使用して正規表現クエリを実行することができますにより、大文字と小文字が区別されなくなります。

代わりに、集計パイプラインステージを使用するMongoDB Search $searchクエリを推奨します。次のクエリでは、ring というタームで終わる映画タイトルが検索されます。

➤ MongoDB Search Playground でこれを試してみてください。

$regex クエリ

$search クエリ

db.movies.find( { title: { $regex: "ring$" } }, { title: 1, _id: 0 } ) // Case-sensitive Query 1
db.movies.find( { title: { $regex: /ring$/ } }, { title: 1, _id: 0 } ) // Case-sensitive Query 2
db.movies.find( { title: { $regex: /ring$/i } }, { title: 1, _id: 0 } ) // Case-insensitive Query 1
db.movies.find( { title: { $regex: "ring$", $options: "i" } }, { title: 1, _id: 0 } ) // Case-insensitive Query 2

[
  { title: 'It Happens Every Spring' },
  { title: 'Larks on a String' },
  { title: 'Release the Prisoners to Spring' },
  { title: 'Manon of the Spring' },
  { title: 'Floundering' },
  { title: 'Autumn Spring' },
  { title: 'The Gathering' },
  { title: 'Blue Spring' },
  { title: 'Blue Spring' },
  { title: 'Girl with a Pearl Earring' },
  { title: 'Spring, Summer, Fall, Winter... and Spring' },
  { title: 'Breaking and Entering' },
  { title: 'Hunting and Gathering' },
  { title: 'Blood Tea and Red String' },
  { title: 'Warm Spring' },
  { title: 'The Conjuring' },
  { title: 'Thanks for Sharing' },
  { title: 'Leaving on the 15th Spring' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "autocomplete": {
        "query": "ring",
        "path": "title",
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'It Happens Every Spring', score: 4.683838844299316 },
  { title: 'Larks on a String', score: 4.683838844299316 },
  {
    title: 'Release the Prisoners to Spring',
    score: 4.683838844299316
  },
  { title: 'Manon of the Spring', score: 4.683838844299316 },
  { title: 'Floundering', score: 4.683838844299316 },
  {
    title: 'The Lord of the Rings: The Fellowship of the Ring',
    score: 4.683838844299316
  },
  { title: 'Autumn Spring', score: 4.683838844299316 },
  { title: 'The Gathering', score: 4.683838844299316 },
  { title: 'The Ring', score: 4.683838844299316 },
  { title: 'Tom and Jerry: The Magic Ring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Girl with a Pearl Earring', score: 4.683838844299316 },
  {
    title: 'Spring, Summer, Fall, Winter... and Spring',
    score: 4.683838844299316
  },
  { title: 'Curse of the Ring', score: 4.683838844299316 },
  { title: 'Breaking and Entering', score: 4.683838844299316 },
  { title: 'Closing the Ring', score: 4.683838844299316 },
  { title: 'Hunting and Gathering', score: 4.683838844299316 },
  { title: 'Blood Tea and Red String', score: 4.683838844299316 },
  { title: 'Warm Spring', score: 4.683838844299316 }
]
Type "it" for more

この$search クエリを実行するには、次のようなMongoDB Searchインデックスを作成します。

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "autocomplete",
          "minGrams": 4,
          "maxGrams": 10,
          "analyzer": "lucene.keyword",
          "tokenization": "rightEdgeGram"
        }
      ]
    }
  }
}

このインデックス定義は、以下を使用して titleフィールドをインデックス化します。

rightEdgeGram トークン化戦略を持つ autocomplete タイプは、テキストを 4（最小）文字から 10（最大）文字の長さの部分文字列または「グラム」に分裂。これは、次の末尾から始まる部分的な検索をサポートしますstring。
lucene.keywordアナライザは、テキストの末尾のみに一致し、中間単語の末尾には一致しないことを確認します。中間単語の接尾辞一致を検索するには、lucene.standard を使用します。

詳細

MongoDB Search クエリの詳細については、クエリとインデックスを参照してください。
MongoDB の正規表現クエリの詳細については、「 $regex 」を参照してください。
MongoDB University では、MongoDB パフォーマンスの最適化に関する無料コースを提供しています。詳しくは、「モニタリングとインサイト」を参照してください。

例

プレフィックス クエリ

注意

部分文字列「含む」クエリ

注意

接尾辞のクエリ

詳細

プレフィックスクエリ