Exact Matches in Atlas Search: Beginners Guide
Rate this article
Search engines are powerful tools that users rely on when they're looking for information. They oftentimes rely on them to handle the misspelling of words through a feature called fuzzy matching. Fuzzy matching identifies text, string, and even queries that are very similar but not the same. This is very useful.
But a lot of the time, the search that is most useful is an exact match. I'm looking for a word,
foobar, and I want
foobarr and not
Luckily, Atlas Search has solutions for both fuzzy searches as well as exact matches. This tutorial will focus on the different ways users can achieve exact matches as well as the pros and cons of each. In fact, there are quite a few ways to achieve exact matches with Atlas Search.
Just like the NYC subway system, there are many ways to get to the same destination, and not all of them are good. So let's talk about the various methods of doing exact match searches, and the pros and cons.
These are policies that allow users to define filters for the text matches they are looking for. For example, if you wanted to find an exact match for a string of text, the best analyzer to use would be the Keyword Analyzer as this analyzer indexes text fields as single terms by accepting a string or array of strings as a parameter.
If you wanted to return exact matches that contain a specific word, the Standard Analyzer would be your go-to as it divides texts based on word-boundaries. It's crucial to first identify and understand the appropriate analyzer you will need based on your use case. This is where MongoDB makes our life easier because you can find all the built-in analyzers Atlas Search supports and their purposes all in one place, as shown below:
Cons: Dealing with case insensitivity search isn’t super straightforward. It's not impossible, of course, but it requires a few extra steps where you would have to define a custom analyzer and run a diacritic-insensitive query.
AKA a "multi-word exact match thing." The Phrase Operator can get exact match queries on multiple words (tokens) in a field. But why use a phrase operator instead of only relying on an analyzer? It’s because the phrase operator searches for an ordered sequence of terms with the help of an analyzer defined in the index configuration. Take a look at this example, where we want to search the phrases “the man” and “the moon” in a movie titles collection:
As you can see, the query returns all the results the contain ordered sequence terms “the man” and “the moon.”
Cons: The phrase operator isn’t compatible with synonym search. What this means is that even if you have enabled, there can be a chance where your search results are whole phrases instead of an individual word. However, you can use the with two should clauses, one with the text query that uses synonyms and another that doesn't, to help go about this issue. Here is a sample code snippet of how to achieve this:
There are few things in life that thrill me as much as the . Remember the sheer, wild joy of using that for the first time with Google search? It was just brilliant. It was one of things that made me want to work in technology in the first place. You type, and the machines know what you're thinking!
And oh yea, it helps me from getting "no search results" repeatedly by guiding me to the correct terminology.
Pros: Autocomplete is awesome. Faster and more responsive search! Cons: There are some limitations with auto-complete. You essentially have to weigh the tradeoffs between faster results vs more relevant results. There are potential workarounds, of course. You can get your exact match score higher by making your autocompleted fields indexed as a string, querying using compound operators, etc... but yea, those tradeoffs are real. I still think it's preferable over plain search, though.
As the name suggests, this operator allows users to search text. Here is how the syntax for the text operator looks:
If you're searching for a single term and want to use full text search to do it, this is the operator for you. Simple, effective, no frills. It's simplicity means it's hard to mess up, and you can use it in complex use cases without worrying. You can also layer the text operator with other items.
Pros: Straightforward, easy to use. Cons: The terms in your query are considered individually, so if you want to return a result that contains more than a single word, you have to nest your operators. Not a huge deal, but as a downside, you'll probably have to conduct a little research on the that fit with your use case.
Although this feature doesn’t necessarily return exact matches like the other features, it's worth highlighting. (See what I did there?!)
Pros: Aesthetically, this feature enhances user search experience because users can easily see what they are searching for in a given text.
Cons: It can be costly if passages are long because a lot more RAM will be needed to hold the data. In addition, this feature does not work with autocomplete.