Usa MongoDB Search en lugar de consultas Regex

Si sus query s dependen de coincidencias con expresiones regulares, puede mejorar el rendimiento y la eficiencia de su query creando un índice de MongoDB Search y ejecutando una $search etapas del pipeline de agregación. $regex es ineficiente porque no siempre puede hacer uso de los índices, mientras que los índices de búsqueda de MongoDB Search mejoran significativamente el rendimiento de tus queries y ofrecen más opciones para personalizar los query parameters.

Esta página describe algunas configuraciones comunes de índices y consultas de MongoDB Search para $regex casos de uso.

Ejemplos

Los ejemplos utilizan un sample_mflix.movies namespace. Para ejecutar las queries de muestra, agrega esta colección a tu clúster o usa los snapshots preconfigurados en el MongoDB Search Playground. Las consultas de ejemplo demuestran cómo usar $search en lugar de $regex para los siguientes casos de uso:

Queries de prefijo

Si tu aplicación frecuentemente $regex para valores de string que comienzan con un conjunto de caracteres o un prefijo, podría utilizar la opción ^, que busca desde el inicio del valor de string, y i, lo que la hace insensible a mayúsculas y minúsculas.

En su lugar, recomendamos MongoDB Search queries que utilizan la etapa de canalización de agregación $search. Las siguientes consultas buscan títulos de películas que comienzan con el prefijo back.

➤ Prueba esto en el Playground de MongoDB Search.

$regex Queries

$search Query

db.movies.find( { title: { $regex: /^back/i } }, { title: 1, _id: 0 } )  // Query 1
db.movies.find( { title: { $regex: "^back", $options: "i" } }, { title: 1, _id: 0 } )  // Query 2

[
  { title: 'Back to the Future' },
  { title: 'Back to School' },
  { title: 'Back to the Future Part II' },
  { title: 'Back to the USSR - takaisin Ryssiin' },
  { title: 'Back to the Future Part III' },
  { title: 'Backdraft' },
  { title: 'Backbeat' },
  { title: 'Backstage' },
  { title: 'Backdoor' },
  { title: 'Backstage' },
  { title: 'Back Soon' },
  { title: 'Backlight' },
  { title: 'Back to Stay' },
  { title: 'Back Issues: The Hustler Magazine Story' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "text": {
        "query": "back",
        "path": "title",
        "matchCriteria": "all"
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'Backdraft', score: 3.8287878036499023 },
  { title: 'Backbeat', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'Backdoor', score: 3.8287878036499023 },
  { title: 'Backstage', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Backwoods', score: 3.8287878036499023 },
  { title: 'The Way Back', score: 3.8287878036499023 },
  { title: '3 Backyards', score: 3.8287878036499023 },
  { title: 'Backlight', score: 3.8287878036499023 },
  { title: 'The Way Way Back', score: 3.8287878036499023 },
  { title: 'Back to the Future', score: 3.455096483230591 },
  { title: 'Back to School', score: 3.455096483230591 },
  { title: 'The Cat Came Back', score: 3.455096483230591 },
  { title: "Jack's Back", score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'T-Rex: Back to the Cretaceous', score: 3.455096483230591 },
  { title: 'The Dark Backward', score: 3.455096483230591 },
  { title: 'No Turning Back', score: 3.455096483230591 },
  { title: "The Devil's Backbone", score: 3.455096483230591 }
]
Type "it" for more

Para ejecutar esta $search consulta, crea un índice de MongoDB Search similar al siguiente:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string",
          "analyzer": "autocomplete-search",
          "searchAnalyzer": "lucene.standard"
        }
      ]
    }
  },
  "analyzers": [
    {
      "name": "autocomplete-search",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 10
        }
      ]
    }
  ]
}

Esta definición de índice indexa el campo title de la colección movies como el tipo de string que utiliza el analizador personalizado autocomplete-search para los campos indexados y el analizador lucene.standard para las consultas. El analizador personalizado llamado autocomplete-search como analyzer para los campos indexados y lucene.standard como searchAnalyzer para las consultas. El analizador personalizado llamado

lowercase filtro de tokens para transformar todos los caracteres a minúsculas para admitir consultas que no detectan mayúsculas ni minúsculas
edgeGram filtro de tokens para crear tokens de entre 4 y 10 caracteres de longitud

Nota

Este analizador personalizado solo admite palabras de hasta 10 caracteres de longitud. Si esperas palabras y consultas más largas de diez caracteres, aumenta el valor de maxGram. No se recomienda establecer un valor maxGram superior a quince porque los valores más altos aumentan el tamaño del índice y podrían afectar el rendimiento y la disponibilidad.

Consultas de subcadena "Contiene"

Si la aplicación realiza queries frecuentes de strings que están presentes en cualquier parte del campo, se podrían ejecutar queries $regex, que comprueban cada documento y devuelven todas las coincidencias sin ningún orden particular.

En cambio, recomendamos consultas de MongoDB Search que utilicen la etapa del pipeline de agregación $search. Las siguientes consultas buscan títulos de películas que contengan el término park en cualquier lugar del campo title.

➤ Prueba esto en el Playground de MongoDB Search.

$regex Query

$search Query

db.movies.find({ title: { $regex: "park", $options: "i" } }, { title: 1, _id: 0 })

[
  { title: 'Barefoot in the Park' },
  { title: 'The Panic in Needle Park' },
  { title: 'Gorky Park' },
  { title: 'The Park Is Mine' },
  { title: 'Jurassic Park' },
  { title: 'Mrs. Parker and the Vicious Circle' },
  { title: 'The Lost World: Jurassic Park' },
  { title: 'Dog Park' },
  { title: 'South Park: Bigger Longer & Uncut' },
  { title: 'Jurassic Park III' },
  { title: 'Mansfield Park' },
  { title: 'Jurassic Park III' },
  { title: 'Gosford Park' },
  { title: 'The Rosa Parks Story' },
  { title: 'The Delicate Art of Parking' },
  { title: 'Wicker Park' },
  { title: 'Chestnut: Hero of Central Park' },
  { title: 'Trailer Park Boys: The Movie' },
  { title: 'Ellie Parker' },
  { title: 'Paranoid Park' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "wildcard": {
        "query": "park*",
        "path": "title",
        "allowAnalyzedField": true
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { "$meta": "searchScore" }
    }
  }
])

[
  { title: 'Barefoot in the Park', score: 1 },
  { title: 'The Panic in Needle Park', score: 1 },
  { title: 'Gorky Park', score: 1 },
  { title: 'The Park Is Mine', score: 1 },
  { title: 'Jurassic Park', score: 1 },
  { title: 'Mrs. Parker and the Vicious Circle', score: 1 },
  { title: 'The Lost World: Jurassic Park', score: 1 },
  { title: 'Dog Park', score: 1 },
  { title: 'South Park: Bigger Longer & Uncut', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Mansfield Park', score: 1 },
  { title: 'Jurassic Park III', score: 1 },
  { title: 'Gosford Park', score: 1 },
  { title: 'The Rosa Parks Story', score: 1 },
  { title: 'Wicker Park', score: 1 },
  { title: 'The Delicate Art of Parking', score: 1 },
  { title: 'Chestnut: Hero of Central Park', score: 1 },
  { title: 'Trailer Park Boys: The Movie', score: 1 },
  { title: 'Ellie Parker', score: 1 },
  { title: 'Paranoid Park', score: 1 }
]
Type "it" for more

Para ejecutar esta consulta $search, crea un índice de búsqueda de MongoDB con la siguiente definición:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "contains",
        "searchAnalyzer": "lucene.standard"
      }
    }
  },
  "analyzers": [
    {
      "name": "contains",
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        {
          "type": "lowercase"
        },
        {
          "type": "reverse"
        },
        {
          "type": "edgeGram",
          "minGram": 4,
          "maxGram": 15
        },
        {
          "type": "reverse"
        }
      ]
    }
  ]
}

Esta definición de índice indexa el campo title en la colección movies como el tipo string usando un analizador personalizado llamado contains que aplica lo siguiente:

standard tokenizador para dividir las palabras por espacios en blanco o signos de puntuación.
lowercase filtro de token para transformar las letras a minúsculas para admitir consultas sin distinción entre mayúsculas y minúsculas.
reverse filtro de token (dos veces) para invertir las palabras y respaldar eficientemente consultas no ancladas.
edgeGram filtro de token para crear tokens de entre cuatro y quince caracteres de longitud.

Nota

Este analizador personalizado sólo admite palabras de hasta quince caracteres de longitud. Si tienes palabras de más de quince caracteres, aumenta el valor de maxGram. No se recomienda establecer un valor maxGram superior a quince, ya que los valores más altos aumentan el tamaño del índice y pueden afectar el rendimiento y la disponibilidad.

Consultas de sufijo

Si tu aplicación consulta frecuentemente valores de campos de string que terminan con una serie de caracteres o sufijos, podrías ejecutar queries regex con la opción $regex $, que busca al final del valor de string, y la opción i, que hace que la query no distinga entre mayúsculas y minúsculas.

En cambio, recomendamos consultas de MongoDB Search que utilicen la etapa del pipeline de agregación $search. Las siguientes consultas buscan títulos de películas que terminan con el término ring.

➤ Prueba esto en el Playground de MongoDB Search.

$regex Queries

$search Query

db.movies.find( { title: { $regex: "ring$" } }, { title: 1, _id: 0 } ) // Case-sensitive Query 1
db.movies.find( { title: { $regex: /ring$/ } }, { title: 1, _id: 0 } ) // Case-sensitive Query 2
db.movies.find( { title: { $regex: /ring$/i } }, { title: 1, _id: 0 } ) // Case-insensitive Query 1
db.movies.find( { title: { $regex: "ring$", $options: "i" } }, { title: 1, _id: 0 } ) // Case-insensitive Query 2

[
  { title: 'It Happens Every Spring' },
  { title: 'Larks on a String' },
  { title: 'Release the Prisoners to Spring' },
  { title: 'Manon of the Spring' },
  { title: 'Floundering' },
  { title: 'Autumn Spring' },
  { title: 'The Gathering' },
  { title: 'Blue Spring' },
  { title: 'Blue Spring' },
  { title: 'Girl with a Pearl Earring' },
  { title: 'Spring, Summer, Fall, Winter... and Spring' },
  { title: 'Breaking and Entering' },
  { title: 'Hunting and Gathering' },
  { title: 'Blood Tea and Red String' },
  { title: 'Warm Spring' },
  { title: 'The Conjuring' },
  { title: 'Thanks for Sharing' },
  { title: 'Leaving on the 15th Spring' }
]

db.movies.aggregate([
  {
    "$search": {
      "index": "default",
      "autocomplete": {
        "query": "ring",
        "path": "title",
      }
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { $meta: "searchScore" }
    }
  }
])

[
  { title: 'It Happens Every Spring', score: 4.683838844299316 },
  { title: 'Larks on a String', score: 4.683838844299316 },
  {
    title: 'Release the Prisoners to Spring',
    score: 4.683838844299316
  },
  { title: 'Manon of the Spring', score: 4.683838844299316 },
  { title: 'Floundering', score: 4.683838844299316 },
  {
    title: 'The Lord of the Rings: The Fellowship of the Ring',
    score: 4.683838844299316
  },
  { title: 'Autumn Spring', score: 4.683838844299316 },
  { title: 'The Gathering', score: 4.683838844299316 },
  { title: 'The Ring', score: 4.683838844299316 },
  { title: 'Tom and Jerry: The Magic Ring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Blue Spring', score: 4.683838844299316 },
  { title: 'Girl with a Pearl Earring', score: 4.683838844299316 },
  {
    title: 'Spring, Summer, Fall, Winter... and Spring',
    score: 4.683838844299316
  },
  { title: 'Curse of the Ring', score: 4.683838844299316 },
  { title: 'Breaking and Entering', score: 4.683838844299316 },
  { title: 'Closing the Ring', score: 4.683838844299316 },
  { title: 'Hunting and Gathering', score: 4.683838844299316 },
  { title: 'Blood Tea and Red String', score: 4.683838844299316 },
  { title: 'Warm Spring', score: 4.683838844299316 }
]
Type "it" for more

Para ejecutar esta $search consulta, crea un índice de MongoDB Search similar al siguiente:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "autocomplete",
          "minGrams": 4,
          "maxGrams": 10,
          "analyzer": "lucene.keyword",
          "tokenization": "rightEdgeGram"
        }
      ]
    }
  }
}

Esta definición de índice indexa el campo title utilizando el:

El tipo autocomplete con la estrategia de tokenización rightEdgeGram para la división el texto en subcadenas o "gramas" de entre 4 (mínimo) y 10 (máximo) caracteres de longitud, lo que soporta búsquedas parciales que comienzan desde el final de la string.
El analizador lucene.keyword para garantizar coincidencias solo al final del texto y no al final de términos intermedios. Para encontrar coincidencias de sufijos en palabras intermedias, utilice lucene.standard.

Obtén más información

Para obtener más información sobre las consultas de búsqueda en MongoDB Search, consulta Consultas e índices.
Para aprender más sobre las consultas de expresiones regulares en MongoDB, consulta $regex.
MongoDB University ofrece un curso gratuito sobre la optimización del rendimiento de MongoDB. Para aprender más, consulta Supervisión y perspectivas.