Atlas
MongoDB Developer Center
chevron-right
Developer Topics
chevron-right
Products
chevron-right
Atlas
chevron-right

A Decisioning Framework for MongoDB $regex and $text vs Atlas Search

Marcus EaganPublished Mar 30, 2022 • Updated May 16, 2022
AtlasSearch
facebook icontwitter iconlinkedin icon
random alt
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Deciding which database feature to implement in order to satisfy a workload can be a daunting thought exercise with important implications. In this short post, we hope to give a bit of a head start to users looking to decide whether or not they want to refactor an existing application using
$regex
and
$text
, or move to
Atlas Search
(full-text search embedded directly in MongoDB Atlas) and its
$search
aggregation stage.

How to use this blog post

Refer to the chart below to understand our recommended best practices for each app requirement, along with the reason. This (inexhaustive) blog will be updated to take into account new capabilities of both the operators and Atlas Search, powered by
Apache Lucene
.
Note: $text and $regex have had no major updates since 2015, and all future enhancements in relevance-based search will be delivered via Atlas Search.
To learn more about Atlas Search,
check out the documentation
.
App Requirements$regex$text$searchReasoning
The datastore must respect write concerns🚫🚫If you have a datastore that must respect write concerns for use cases like transactions with heavy reads after writes,
$regex
is a better choice. For search use cases, reads after writes should be rare.
Language awareness (Spanish, Chinese, English, etc.)🚫🚫Atlas Search natively supports over
40 languages
so that you can better tokenize languages, remove stopwords, and interpret diacritics to support improved search relevance.
Case-insensitive text search🚫🚫Case-insensitive text search using $regex is one of the biggest sources of problems among our customer base, and
$search
offers far more capabilities than $text.
Highlighting result text🚫🚫The ability to
highlight
text fragments in result documents helps end users contextualize why some documents are returned compared to others. It's essential for user experiences powered by natural language queries. While developers could implement a crude version of highlighting with the other options, the $search aggregation stage provides an easy-to-consume API and a core engine that handles topics like tokenization and offsets.
Geospatial-aware search queries🚫Both $regex and
$search
have geospatial capabilities. The differences between the two lie in the differences between how $regex and $search treat geospatial parameters. For instance, Lucene draws a straight line from one query coordinate to another, whereas MongoDB lines are spherical. Spherical queries are best for flights, whereas flat map queries might be better for short distances.
On-premises or local deployment🚫Atlas Search is not available on-premise or for local deployment. The single deployment target enables our team to move fast and innovate at a more rapid pace than if we targeted many deployment models. For that reason, $regex and $text are the only options for people who do not have access to Atlas.
Autocomplete of characters (nGrams)🚫🚫End users typing in a search box have grown accustomed to an experience where their search queries are completed for them. Atlas Search offers
edgeGrams
for left-to-right autocomplete,
nGrams
for autocomplete with languages that do not have whitespace, and rightEdgeGram for languages that are written and read right-to-left.
Autocomplete of words (wordGrams)🚫🚫If you have a field with more than two words and want to offer word-based autocomplete as a feature of your application, then a
shingle token filter
with custom analyzers could be best for you. Custom analyzers offer developers a flexible way to index and modify how their data is stored.
Fuzzy matching on text input🚫🚫If you would like to filter on user generated input, Atlas Search’s fuzzy offers flexibility. Issues like misspelled words are handled best by $search.
Filtering based on more than 10 strings🚫🚫It’s tricky to filter on more than 10 strings in MongoDB due to the limitations of compound text indexes. The
compound filter
is again the right way to go here.
Relevance score sorted search🚫🚫Atlas Search uses the state-of-art
BM25 algorithm
for determining the search relevance score of documents and allows for advanced configuration through
boost expressions
like multiply and gaussian decay, as well as analyzers, search operators, and synonyms.
Cluster needs to be optimized for write performance🚫🚫When you add a database index in MongoDB, you should consider tradeoffs to write performance in cases where database write performance is important. Search Indexes don’t degrade cluster write performance.
Searching through large data sets🚫🚫If you have lots of documents, your queries will linearly get slower. In Atlas Search, the inverted index enables fast document retrieval at very large scales.
Partial indexes for simple text matching🚫🚫Atlas Search does not yet support partial indexing. Today, $regex takes the cake.
Single compound index on arrays🚫🚫Atlas Search is partially designed for this use case, where term indexes are intersected in a single Search index, to eliminate the need for compound indexes for filtering on arrays.
Synonyms search🚫🚫The only option for robust
synonyms
search is Atlas Search, where synonyms are defined in a collection, and that collection is referenced in your search index.
Fast faceting for counts🚫🚫If you are looking for
faceted
navigation, or fast counts of documents based on text criteria, let Atlas Search do the bucketing. In our internal testing,
it's 100x faster
and also supports number and date buckets.
Custom analyzers (stopwords, email/URL token, etc.)🚫🚫Using Atlas Search, you can define a
custom analyzer
to suit your specific indexing needs.
Partial match🚫🚫MongoDB has a number of partial match options ranging from the
wildcard operator
to
autocomplete
, which can be useful for some partial match use cases.
Phrase queries🚫🚫Phrase queries are supported natively in Atlas Search via the
phrase operator
.
Note: The green check mark sometimes does not appear in cases where the corresponding aggregation stage may be able to satisfy an app requirement, and in those cases, it’s because one of the other stages (i.e., $search) is far superior for a given use case.
If we’ve whetted your appetite to learn more about Atlas Search, we have some resources to get you started:
The
Atlas Search documentation
provides reference materials and tutorials, while the
MongoDB Developer Hub
provides sample apps and code. You can spin up Atlas Search at no cost on the
Atlas Free Tier
and follow along with the tutorials using our sample data sets, or load your own data for experimentation within your own sandbox.

Copy Link
facebook icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Code Example
Blogue

May 20, 2022
Article
The MongoDB Atlas Sample Datasets

May 31, 2022
Quickstart
How to Connect MongoDB Atlas to Vercel Using the New Integration

Jun 14, 2022
Tutorial
Tutorial: Build a Movie Search Engine Using Atlas Full-Text Search in 10 Minutes

May 12, 2022
Table of Contents
  • How to use this blog post