使用MongoDB搜索而不是正则表达式查询

如果您的查询依赖于正则表达式匹配，则可以通过创建 MongoDB Search索引并运行$search聚合管道阶段来提高查询的性能和效率。$regex 效率低下，因为它无法始终利用索引，而MongoDB Search索引可显着提高查询性能，并提供更多用于自定义查询参数的选项。

本页介绍了针对 $regex 使用案例的一些常见MongoDB搜索索引和查询配置。

示例

这些示例使用 sample_mflix.movies命名空间。要运行示例查询，请将此集合添加到集群或使用MongoDB Search Playground 中预先配置的快照。示例查询演示了如何在以下使用案例中使用 $search 而不是 $regex：

前缀查询

如果您的应用程序经常查询以一设立字符或前缀开头的 string 值，则可能会使用 $regex 选项 ^（从 string 值的开头开始搜索）和 i（大小写），不敏感。

相反，我们建议使用聚合管道阶段的 MongoDB Search 查询 $search。以下查询搜索以前缀 back 开头的电影标题。

➤ 在MongoDB Search Playground中尝试一下。

$regex 查询

$search 查询

db.movies.find( { title: { $regex: /^back/i } }, { title: 1, _id: 0 } )  // Query 1
db.movies.find( { title: { $regex: "^back", $options: "i" } }, { title: 1, _id: 0 } )  // Query 2

[
  { title: 'Back to the Future' },
  { title: 'Back to School' },
  { title: 'Back to the Future Part II' },
  { title: 'Back to the USSR - takaisin Ryssiin' },
  { title: 'Back to the Future Part III' },
  { title: 'Backdraft' },
  { title: 'Backbeat' },
  { title: 'Backstage' },
  { title: 'Backdoor' },
  { title: 'Backstage' },
  { title: 'Back Soon' },
  { title: 'Backlight' },
  { title: 'Back to Stay' },
  { title: 'Back Issues: The Hustler Magazine Story' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "text": {
        "query": "back",
        "path": "title",
        "matchCriteria": "all"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'Backdraft', score: 3.8287878036499023 },
  { title: 'Backbeat', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'Backdoor', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Way Back', score: 3.8287878036499023 },
  { title: '3 Backyards', score: 3.8287878036499023 },
  { title: 'Backlight', score: 3.8287878036499023 },
  { title: 'The Way Way Back', score: 3.8287878036499023 },
  { title: 'Back to the Future', score: 3.455096483230591 },
  { title: 'Back to School', score: 3.455096483230591 },
  { title: 'The Cat Came Back', score: 3.455096483230591 },
  { title: "Jack's Back", score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'T-Rex: Back to the Cretaceous', score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'No Turning Back', score: 3.455096483230591 },
  { title: "The Devil's Backbone", score: 3.455096483230591 }
]
Type "it" for more

要运行此$search查询，请创建一个类似于以下内容的MongoDB Search索引：

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string",
          "analyzer": "autocomplete-search",
          "searchAnalyzer": "lucene.standard"
        }
      ]
    }
  },
  "analyzers": [
    {
      "name": "autocomplete-search",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 10
        }
      ]
    }
  ]
}

此索引定义将 movies集合中的 title字段索引为 string 类型，该类型使用 autocomplete-search自定义分析器进行索引字段，使用 lucene.standard分析器进行查询。自定义分析器将 autocomplete-search 命名为索引字段的 analyzer，并将 lucene.standard 命名为查询的 searchAnalyzer。自定义分析器名为

lowercase 词元过滤，用于将所有字符转换为小写，以支持不区分大小写的查询
edgeGram 词元过滤器，用于创建长度在 4 到 10 个字符之间的词元

注意

此自定义分析器仅支持长度不超过 10 个字符的单词。如果您预计单词和查询的长度超过 10 个字符，请增加 maxGram 值。我们不建议将 maxGram 值设置为大于 15，因为较高的值会增加索引的大小，并可能影响性能和可用性。

子字符串“包含”查询

如果应用程序经常查询字段中任何位置都存在的字符串，则可以运行$regex 查询，它会检查每个文档并不分特定顺序返回所有匹配项。

相反，我们建议使用聚合管道阶段的 MongoDB Search 查询 $search。以下查询搜索在 title字段中任意位置包含术语park 的电影标题。

➤ 在MongoDB Search Playground中尝试一下。

$regex 查询

$search 查询

db.movies.find({ title: { $regex: "park", $options: "i" } }, { title: 1, _id: 0 })

[
  { title: 'Barefoot in the Park' },
  { title: 'The Panic in Needle Park' },
  { title: 'Gorky Park' },
  { title: 'The Park Is Mine' },
  { title: 'Jurassic Park' },
  { title: 'Mrs. Parker and the Vicious Circle' },
  { title: 'The Lost World: Jurassic Park' },
  { title: 'Dog Park' },
  { title: 'South Park: Bigger Longer & Uncut' },
  { title: 'Jurassic Park III' },
  { title: 'Mansfield Park' },
  { title: 'Jurassic Park III' },
  { title: 'Gosford Park' },
  { title: 'The Rosa Parks Story' },
  { title: 'The Delicate Art of Parking' },
  { title: 'Wicker Park' },
  { title: 'Chestnut: Hero of Central Park' },
  { title: 'Trailer Park Boys: The Movie' },
  { title: 'Ellie Parker' },
  { title: 'Paranoid Park' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "wildcard": {
        "query": "park*",
        "path": "title",
        "allowAnalyzedField": true
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { "$meta": "searchScore" }
    }
  }
])

[
  { title: 'Barefoot in the Park', score: 1 },
  { title: 'The Panic in Needle Park', score: 1 },
  { title: 'Gorky Park', score: 1 },
  { title: 'The Park Is Mine', score: 1 },
  { title: 'Jurassic Park', score: 1 },
  { title: 'Mrs. Parker and the Vicious Circle', score: 1 },
  { title: 'The Lost World: Jurassic Park', score: 1 },
  { title: 'Dog Park', score: 1 },
  { title: 'South Park: Bigger Longer & Uncut', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Mansfield Park', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Gosford Park', score: 1 },
  { title: 'The Rosa Parks Story', score: 1 },
  { title: 'Wicker Park', score: 1 },
  { title: 'The Delicate Art of Parking', score: 1 },
  { title: 'Chestnut: Hero of Central Park', score: 1 },
  { title: 'Trailer Park Boys: The Movie', score: 1 },
  { title: 'Ellie Parker', score: 1 },
  { title: 'Paranoid Park', score: 1 }
]
Type "it" for more

要运行此$search查询，请使用以下定义创建MongoDB Search索引：

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "contains",
        "searchAnalyzer": "lucene.standard"
      }
    }
  },
  "analyzers": [
    {
      "name": "contains",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "reverse"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 15
        },
        {
          "type": "reverse"
        }
      ]
    }
  ]
}

此索引定义使用名为 contains的自定义分析器将 movies集合中的 title字段索引为 string 类型，该分析器应用以下操作：

standard 分词器，用于按空格或标点符号分割单词。
lowercase 词元过滤器，用于将字母转换为小写以支持不区分大小写的查询。
reverse 词元过滤器（两次）用于反转单词以支持高效的非锚定查询。
edgeGram 词元过滤器，用于创建长度为 4 到 15 个字符的词元。

注意

此自定义分析器仅支持长度不超过 15 个字符的单词。如果单词长度超过 15 个字符，请增加 maxGram 值。建议不要将 maxGram 值设立为大于 15，因为较高的值会增加索引的大小，并可能影响性能和可用性。

后缀查询

如果您的应用程序经常查询以一设立字符或后缀结尾的字符串字段值，则可以使用$regex 选项 $ 和选项 i运行正则表达式查询，前者搜索字符串值的末尾，使其不区分大小写。

相反，我们建议使用聚合管道阶段的 MongoDB Search 查询 $search。以下查询搜索以术语ring 结尾的电影标题。

➤ 在MongoDB Search Playground中尝试一下。

$regex 查询

$search 查询

db.movies.find( { title: { $regex: "ring$" } }, { title: 1, _id: 0 } ) // Case-sensitive Query 1
db.movies.find( { title: { $regex: /ring$/ } }, { title: 1, _id: 0 } ) // Case-sensitive Query 2
db.movies.find( { title: { $regex: /ring$/i } }, { title: 1, _id: 0 } ) // Case-insensitive Query 1
db.movies.find( { title: { $regex: "ring$", $options: "i" } }, { title: 1, _id: 0 } ) // Case-insensitive Query 2

[
  { title: 'It Happens Every Spring' },
  { title: 'Larks on a String' },
  { title: 'Release the Prisoners to Spring' },
  { title: 'Manon of the Spring' },
  { title: 'Floundering' },
  { title: 'Autumn Spring' },
  { title: 'The Gathering' },
  { title: 'Blue Spring' },
  { title: 'Blue Spring' },
  { title: 'Girl with a Pearl Earring' },
  { title: 'Spring, Summer, Fall, Winter... and Spring' },
  { title: 'Breaking and Entering' },
  { title: 'Hunting and Gathering' },
  { title: 'Blood Tea and Red String' },
  { title: 'Warm Spring' },
  { title: 'The Conjuring' },
  { title: 'Thanks for Sharing' },
  { title: 'Leaving on the 15th Spring' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "autocomplete": {
        "query": "ring",
        "path": "title",
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'It Happens Every Spring', score: 4.683838844299316 },
  { title: 'Larks on a String', score: 4.683838844299316 },
  {
    title: 'Release the Prisoners to Spring',
    score: 4.683838844299316
  },
  { title: 'Manon of the Spring', score: 4.683838844299316 },
  { title: 'Floundering', score: 4.683838844299316 },
  {
    title: 'The Lord of the Rings: The Fellowship of the Ring',
    score: 4.683838844299316
  },
  { title: 'Autumn Spring', score: 4.683838844299316 },
  { title: 'The Gathering', score: 4.683838844299316 },
  { title: 'The Ring', score: 4.683838844299316 },
  { title: 'Tom and Jerry: The Magic Ring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Girl with a Pearl Earring', score: 4.683838844299316 },
  {
    title: 'Spring, Summer, Fall, Winter... and Spring',
    score: 4.683838844299316
  },
  { title: 'Curse of the Ring', score: 4.683838844299316 },
  { title: 'Breaking and Entering', score: 4.683838844299316 },
  { title: 'Closing the Ring', score: 4.683838844299316 },
  { title: 'Hunting and Gathering', score: 4.683838844299316 },
  { title: 'Blood Tea and Red String', score: 4.683838844299316 },
  { title: 'Warm Spring', score: 4.683838844299316 }
]
Type "it" for more

要运行此$search查询，请创建一个类似于以下内容的MongoDB Search索引：

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "autocomplete",
          "minGrams": 4,
          "maxGrams": 10,
          "analyzer": "lucene.keyword",
          "tokenization": "rightEdgeGram"
        }
      ]
    }
  }
}

此索引定义使用以下内容对 title字段进行索引：

采用 rightEdgeGram 分词策略的 autocomplete 类型，可将文本分割为长度介于 4（最小）和 10（最大）字符之间的子字符串或“gram”，支持从结尾开始的部分搜索字符串。
lucene.keyword分析器确保仅在文本末尾匹配，而不在中间词末尾匹配。要查找中间单词的后缀匹配项，请使用 lucene.standard。

了解详情

要学习；了解有关MongoDB Search 查询的更多信息，请参阅查询和索引。
要了解有关 MongoDB 中正则表达式查询的更多信息，请参阅$regex。
MongoDB University 提供有关优化 MongoDB 性能的免费课程。要了解更多信息，请参阅监控和见解。