HomeLearnArticleA Decisioning Framework for MongoDB $regex and $text vs Atlas Search

A Decisioning Framework for MongoDB $regex and $text vs Atlas Search

Updated: Apr 25, 2022 |

Published: Mar 30, 2022

By Marcus Eagan

Rate this article

Deciding which database feature to implement in order to satisfy a workload can be a daunting thought exercise with important implications. In this short post, we hope to give a bit of a head start to users looking to decide whether or not they want to refactor an existing application using $regex and $text, or move to Atlas Search (full-text search embedded directly in MongoDB Atlas) and its $search aggregation stage.

#How to use this blog post

Refer to the chart below to understand our recommended best practices for each app requirement, along with the reason. This (inexhaustive) blog will be updated to take into account new capabilities of both the operators and Atlas Search, powered by Apache Lucene.

Note: $text and $regex have had no major updates since 2015, and all future enhancements in relevance-based search will be delivered via Atlas Search.

To learn more about Atlas Search, check out the documentation.

App Requirements$regex$text$searchReasoning
The datastore must respect write concerns
βœ…
🚫
🚫
If you have a datastore that must respect write concerns for use cases like transactions with heavy reads after writes, $regex is a better choice. For search use cases, reads after writes should be rare.
Language awareness (Spanish, Chinese, English, etc.)πŸš«πŸš«βœ…Atlas Search natively supports over 40 languages so that you can better tokenize languages, remove stopwords, and interpret diacritics to support improved search relevance.
Case-insensitive text searchπŸš«πŸš«βœ…Case-insensitive text search using $regex is one of the biggest sources of problems among our customer base, and $search offers far more capabilities than $text.
Highlighting result textπŸš«πŸš«βœ…The ability to highlight text fragments in result documents helps end users contextualize why some documents are returned compared to others. It's essential for user experiences powered by natural language queries. While developers could implement a crude version of highlighting with the other options, the $search aggregation stage provides an easy-to-consume API and a core engine that handles topics like tokenization and offsets.
Geospatial-aware search queriesβœ…πŸš«βœ…Both $regex and $search have geospatial capabilities. The differences between the two lie in the differences between how $regex and $search treat geospatial parameters. For instance, Lucene draws a straight line from one query coordinate to another, whereas MongoDB lines are spherical. Spherical queries are best for flights, whereas flat map queries might be better for short distances.
On-premises or local deploymentβœ…βœ…πŸš«Atlas Search is not available on-premise or for local deployment. The single deployment target enables our team to move fast and innovate at a more rapid pace than if we targeted many deployment models. For that reason, $regex and $text are the only options for people who do not have access to Atlas.
Autocomplete of characters (nGrams)πŸš«πŸš«βœ…End users typing in a search box have grown accustomed to an experience where their search queries are completed for them. Atlas Search offers edgeGrams for left-to-right autocomplete, nGrams for autocomplete with languages that do not have whitespace, and rightEdgeGram for languages that are written and read right-to-left.
Autocomplete of words (wordGrams)πŸš«πŸš«βœ…If you have a field with more than two words and want to offer word-based autocomplete as a feature of your application, then a shingle token filter with custom analyzers could be best for you. Custom analyzers offer developers a flexible way to index and modify how their data is stored.
Fuzzy matching on text inputπŸš«πŸš«βœ…If you would like to filter on user generated input, Atlas Search’s fuzzy offers flexibility. Issues like misspelled words are handled best by $search.
Filtering based on more than 10 stringsπŸš«πŸš«βœ…It’s tricky to filter on more than 10 strings in MongoDB due to the limitations of compound text indexes. The compound filter is again the right way to go here.
Relevance score sorted searchπŸš«πŸš«βœ…Atlas Search uses the state-of-art BM25 algorithm for determining the search relevance score of documents and allows for advanced configuration through boost expressions like multiply and gaussian decay, as well as analyzers, search operators, and synonyms.
Cluster needs to be optimized for write performanceπŸš«πŸš«βœ…When you add a database index in MongoDB, you should consider tradeoffs to write performance in cases where database write performance is important. Search Indexes don’t degrade cluster write performance.
Searching through large data setsπŸš«πŸš«βœ…If you have lots of documents, your queries will linearly get slower. In Atlas Search, the inverted index enables fast document retrieval at very large scales.
Partial indexes for simple text matchingβœ…πŸš«πŸš«Atlas Search does not yet support partial indexing. Today, $regex takes the cake.
Single compound index on arraysπŸš«πŸš«βœ…Atlas Search is partially designed for this use case, where term indexes are intersected in a single Search index, to eliminate the need for compound indexes for filtering on arrays.
Synonyms searchπŸš«πŸš«βœ…The only option for robust synonyms search is Atlas Search, where synonyms are defined in a collection, and that collection is referenced in your search index.
Fast faceting for countsπŸš«πŸš«βœ…If you are looking for faceted navigation, or fast counts of documents based on text criteria, let Atlas Search do the bucketing. In our internal testing, it's 100x faster and also supports number and date buckets.
Custom analyzers (stopwords, email/URL token, etc.)πŸš«πŸš«βœ…Using Atlas Search, you can define a custom analyzer to suit your specific indexing needs.
Partial matchπŸš«πŸš«βœ…MongoDB has a number of partial match options ranging from the wildcard operator to autocomplete, which can be useful for some partial match use cases.
Phrase queriesπŸš«πŸš«βœ…Phrase queries are supported natively in Atlas Search via the phrase operator.

Note: The green check mark sometimes does not appear in cases where the corresponding aggregation stage may be able to satisfy an app requirement, and in those cases, it’s because one of the other stages (i.e., $search) is far superior for a given use case.

If we’ve whetted your appetite to learn more about Atlas Search, we have some resources to get you started:

The Atlas Search documentation provides reference materials and tutorials, while the MongoDB Developer Hub provides sample apps and code. You can spin up Atlas Search at no cost on the Atlas Free Tier and follow along with the tutorials using our sample data sets, or load your own data for experimentation within your own sandbox.

Rate this article
MongoDB logo
Β© 2021 MongoDB, Inc.

About

  • Careers
  • Investor Relations
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
Β© 2021 MongoDB, Inc.