MongoDB Developer
MongoDB Developer Centerchevron-right
Developer Topicschevron-right

A Decisioning Framework for MongoDB $regex and $text vs Atlas Search

Marcus EaganPublished Mar 30, 2022 • Updated May 16, 2022
Facebook Icontwitter iconlinkedin icon
random alt
Rate this article
Deciding which database feature to implement in order to satisfy a workload can be a daunting thought exercise with important implications. In this short post, we hope to give a bit of a head start to users looking to decide whether or not they want to refactor an existing application using $regex and $text, or move to Atlas Search (full-text search embedded directly in MongoDB Atlas) and its $search aggregation stage.

How to use this blog post

Refer to the chart below to understand our recommended best practices for each app requirement, along with the reason. This (inexhaustive) blog will be updated to take into account new capabilities of both the operators and Atlas Search, powered by Apache Lucene.
Note: $text and $regex have had no major updates since 2015, and all future enhancements in relevance-based search will be delivered via Atlas Search.
To learn more about Atlas Search, check out the documentation.
App Requirements$regex$text$searchReasoning
The datastore must respect write concerns🚫🚫If you have a datastore that must respect write concerns for use cases like transactions with heavy reads after writes, $regex is a better choice. For search use cases, reads after writes should be rare.
Language awareness (Spanish, Chinese, English, etc.)🚫🚫Atlas Search natively supports over 40 languages so that you can better tokenize languages, remove stopwords, and interpret diacritics to support improved search relevance.
Case-insensitive text search🚫🚫Case-insensitive text search using $regex is one of the biggest sources of problems among our customer base, and $search offers far more capabilities than $text.
Highlighting result text🚫🚫The ability to highlight text fragments in result documents helps end users contextualize why some documents are returned compared to others. It's essential for user experiences powered by natural language queries. While developers could implement a crude version of highlighting with the other options, the $search aggregation stage provides an easy-to-consume API and a core engine that handles topics like tokenization and offsets.
Geospatial-aware search queries🚫Both $regex and $search have geospatial capabilities. The differences between the two lie in the differences between how $regex and $search treat geospatial parameters. For instance, Lucene draws a straight line from one query coordinate to another, whereas MongoDB lines are spherical. Spherical queries are best for flights, whereas flat map queries might be better for short distances.
On-premises or local deployment🚫Atlas Search is not available on-premise or for local deployment. The single deployment target enables our team to move fast and innovate at a more rapid pace than if we targeted many deployment models. For that reason, $regex and $text are the only options for people who do not have access to Atlas.
Autocomplete of characters (nGrams)🚫🚫End users typing in a search box have grown accustomed to an experience where their search queries are completed for them. Atlas Search offers edgeGrams for left-to-right autocomplete, nGrams for autocomplete with languages that do not have whitespace, and rightEdgeGram for languages that are written and read right-to-left.
Autocomplete of words (wordGrams)🚫🚫If you have a field with more than two words and want to offer word-based autocomplete as a feature of your application, then a shingle token filter with custom analyzers could be best for you. Custom analyzers offer developers a flexible way to index and modify how their data is stored.
Fuzzy matching on text input🚫🚫If you would like to filter on user generated input, Atlas Search’s fuzzy offers flexibility. Issues like misspelled words are handled best by $search.
Filtering based on more than 10 strings🚫🚫It’s tricky to filter on more than 10 strings in MongoDB due to the limitations of compound text indexes. The compound filter is again the right way to go here.
Relevance score sorted search🚫🚫Atlas Search uses the state-of-art BM25 algorithm for determining the search relevance score of documents and allows for advanced configuration through boost expressions like multiply and gaussian decay, as well as analyzers, search operators, and synonyms.
Cluster needs to be optimized for write performance🚫🚫When you add a database index in MongoDB, you should consider tradeoffs to write performance in cases where database write performance is important. Search Indexes don’t degrade cluster write performance.
Searching through large data sets🚫🚫If you have lots of documents, your queries will linearly get slower. In Atlas Search, the inverted index enables fast document retrieval at very large scales.
Partial indexes for simple text matching🚫🚫Atlas Search does not yet support partial indexing. Today, $regex takes the cake.
Single compound index on arrays🚫🚫Atlas Search is partially designed for this use case, where term indexes are intersected in a single Search index, to eliminate the need for compound indexes for filtering on arrays.
Synonyms search🚫🚫The only option for robust synonyms search is Atlas Search, where synonyms are defined in a collection, and that collection is referenced in your search index.
Fast faceting for counts🚫🚫If you are looking for faceted navigation, or fast counts of documents based on text criteria, let Atlas Search do the bucketing. In our internal testing, it's 100x faster and also supports number and date buckets.
Custom analyzers (stopwords, email/URL token, etc.)🚫🚫Using Atlas Search, you can define a custom analyzer to suit your specific indexing needs.
Partial match🚫🚫MongoDB has a number of partial match options ranging from the wildcard operator to autocomplete, which can be useful for some partial match use cases.
Phrase queries🚫🚫Phrase queries are supported natively in Atlas Search via the phrase operator.
Note: The green check mark sometimes does not appear in cases where the corresponding aggregation stage may be able to satisfy an app requirement, and in those cases, it’s because one of the other stages (i.e., $search) is far superior for a given use case.
If we’ve whetted your appetite to learn more about Atlas Search, we have some resources to get you started:
The Atlas Search documentation provides reference materials and tutorials, while the MongoDB Developer Hub provides sample apps and code. You can spin up Atlas Search at no cost on the Atlas Free Tier and follow along with the tutorials using our sample data sets, or load your own data for experimentation within your own sandbox.

Facebook Icontwitter iconlinkedin icon
Rate this article

Next Gen Web Apps with Remix and MongoDB Atlas Data API

Oct 25, 2022

ELT MongoDB Data Using Airbyte

Nov 16, 2022

Implement Full-Text Search over a GraphQL API in Atlas

Nov 16, 2022

MongoDB Atlas Multicloud Clusters

May 16, 2022
Table of Contents
  • How to use this blog post