Rate this tutorial
When it comes to finding specific words or phrases within text, you're probably going to want to use a natural language search option like full-text search (FTS). Sure, you could probably create a complicated and difficult-to-maintain set of regular expressions to search within text, but that is an option that most developers don't want. Not to mention it won't cover the full scope of what a natural language processor typically accomplishes.
In this tutorial, we're going to see how to use Atlas Search and work with the highlight data to visually show any matches on the terms in a user facing application. Highlighting is a powerful tool with Search to allow your users to find the exact text that they want in its proper context.
To get an idea of what we plan to accomplish, take a look at the following animated image:
In the above scenario, we are searching through messages in a chat room. When we enter a term to search, we get the chat messages in return, with any potential hits highlighted. The potential hits can match exactly, or they could have a certain level of fuzziness which we'll explore. In this particular example, the number of highlighted responses is limited to five.
Before we jump directly into the creation of the back end for searching and the front end for displaying, we need to have an idea of our data model. Let's assume we are working with user chat data and we want to search for certain words and phrases. With this in mind, our documents could potentially look like this:
The above document sample isn't the most realistic, but it gives us something. Every time a new message is added to the chat room, it is appended to the
messagesarray with the associated sender information. We could make this significantly more complex, but we don't need to for this example.
The next step is to create a default search index on our data collection. For this example, we'll be using a
gamedevdatabase and a
While we could create an index specific to the fields we're planning to use, for simplicity, creating a dynamic default index will be more than enough. To do this, simply click on the green Create Search Index button. Let's accept the default settings and click Create Index. This will give us the default index with the following configuration:
In this example, we'll need a back end to handle the interaction with the database for searching. To keep the stack consistent for this example, we're going to use Node.js with some common dependencies.
Create a new directory on your computer and from the command line, execute the following:
The above commands will create a new package.json file and download Express Framework, the MongoDB Node.js driver, and a cross-origin resource sharing middleware that will allow us to reach our back end from our front end operating on a different port.
Within the same project directory, create a main.js file and add the following boilerplate Express Framework with MongoDB code:
In the above code, we are importing each of our dependencies and initializing Express Framework as well as MongoDB. The
ATLAS_URIin the above example should be stored as an environment variable on your computer. You can obtain it from the MongoDB Atlas dashboard and it will look something like this:
Take note of the section of the code where we are listening for connections:
In the above code, we are connecting to the specified MongoDB Atlas cluster and we are obtaining a handle to the
chatscollection within the
gamedevdatabase. Feel free to use your own collection and database naming, but note that this example will follow the previously defined data model when it comes to searching.
With the boilerplate in place, let's jump into the
/searchendpoint that is currently empty. Instead, we're going to want to change it to the following:
In the above endpoint code, we are creating an aggregation pipeline.
Because we plan to use Atlas Search, the
$searchoperator needs to be the first stage in the pipeline. In this first stage, we are searching around a provided term. Rather than searching the entire document, we are searching within the
messageobject of the
fuzzyfield with a
2defines the number of single-character edits required to match the specified search term. For example, if we enter
hlo, we might get a hit on
hello, where as if we hadn't defined the fuzzy information, a hit might not be found. More information can be found in the .
The second stage of the pipeline will add the highlight data to the results before they are returned to the client. The highlight metadata isn't a part of the original document, hence the need to add it using the $meta operator prior to the response. You can read more about the
$metaoperator and the metadata it can surface in the . You could also use the
$metaoperator in a
$projectstage instead of
Since this is a MongoDB aggregation pipeline, you can combine any number of aggregation operators, as long as
$searchis the first in the pipeline.
If there's data in the collection, the application is ready to be used.
The next step is to display the search data on the screen. Most of what comes next is in regards to massaging the data into a format that we want to use, which includes visually highlighting the data with HTML markup.
We're going to need to create another project directory, this time representing the front end instead of the back end. Within this new directory, create an index.html file with the following markup:
In the above code, we have a form that calls a
searchfunction when the button is clicked. As of right now, the
searchfunction only obtains the search term and references the area where search results should be output.
Let's further narrow down what the
searchfunction should do.
The above modifications to the function might be a lot to take in. Let's break down what's happening.
After clearing the output space, we are making a request to the back end:
The results of that request will have the documents found as well as the highlight data associated to the search.
The next step will be to loop through each of the results and then each of the messages for the results. This is where things can become a bit confusing. MongoDB will return data that looks like the following when it comes to highlighting:
Here, we are constructing a string from the original highlight pieces as well as a string where the hit is wrapped in markup. The goal is to use the
hellowould result in
helloworldbeing incorrectly highlighted. This is why we need to work with the adjacent data that MongoDB returns.
Like previously mentioned, the front end is really just doing a lot of visual manipulations using the result and highlight data that the back end came up with.