Add US Postal Abbreviations to Your Atlas Search in 5 Minutes
Rate this tutorial
There are cases when it helps to have synonyms set up to work with your Atlas Search index. For example, if the search in your application needs to work with addresses, it might help to set up a list of common synonyms for postal abbreviations, so one could type in “blvd” instead of “boulevard” and still find all places with “boulevard” in the address.
This tutorial will show you how to set up your Atlas Search index to recognize US postal abbreviations.
To be successful with this tutorial, you will need:
To learn about synonyms in Atlas Search, we suggest you start by checking out our . Synonyms allow you to index and search your collection for words that have the same or nearly the same meaning, or, in the case of our tutorial, you can search using different ways to write out an address and still get the results you expect. To set up and use synonyms in Atlas Search, you will need to:
We will walk you through these steps in the tutorial, but first, let’s start with creating the JSON documents that will form our synonyms collection.
All documents in the synonyms collection must have a that specifies the type of synonyms—equivalent or explicit. Explicit synonyms have a one-way mapping. For example, if “boat” is explicitly mapped to “sail,” we’d be saying that if someone searches “boat,” we want to return all documents that include “sail” and “boat.” However, if we search the word “sail,” we would not get any documents that have the word “boat.” In the case of postal abbreviations, however, one can use all abbreviations interchangeably, so we will use the “equivalent” type of synonym in the mappingType field.
Here is a sample document in the synonyms collection for all the possible abbreviations of “avenue”:
We wrote the web scraping code for you in Python, and you can run it with the following commands to create a document for each synonym group:
To see details of the Python code, read the rest of the section.
Let’s start with the Street Suffix Abbreviations page. We want to create objects that represent both the URL and the page itself:
Next, we want to get the information on the page. We’ll start by parsing the HTML, and then get the table by its id:
One thing to take note of is that in the table provided on USPS’s website, one primary name is usually mapped to multiple commonly used names. This means we need to dynamically group together commonly used names by their corresponding primary name and compile that into a list:
Once our names are all grouped together, we can loop through them and export them as individual JSON files.
Now, let’s do the same thing for the Secondary Unit Designators page:
Just as before, we’ll start with getting the page and transforming it to a dataframe:
If we look at the table more closely, we can see that one of the values is blank. While it makes sense that the USPS would include this in the table, it’s not something that we want in our synonyms list. To take care of that, we’ll simply remove all rows that have blank values:
Next, we’ll take our new dataframe and turn it into a list:
You may have noticed that some of the values in the table have asterisks in them. Let’s quickly get rid of them so they won’t be included in our synonym mappings:
Now we can now loop through them and export them as individual JSON files just as we did before. The one thing to note is that we want to restrict the range on which we’re iterating to include only the relevant data we want:
Now that we created the JSON documents for abbreviations, let’s load them all into a collection in the sample_restaurants database. If you haven’t already created a MongoDB cluster, now is a good time to do that and load the sample data in.
To connect to your Atlas cluster, you will need a . Choose the “Connect with the MongoDB Shell” option and follow the instructions. Note that you will need to connect with a that has permissions to modify the database, since we would be creating a collection in the sample_restaurant database. The command you need to enter in the terminal will look something like:
When prompted for the password, enter the database user’s password.
We created our synonym JSON documents in the right format already, but let’s make sure that if we decide to add more documents to this collection, they will also have the correct format. To do that, we will create a synonyms collection with a validator that uses . The commands below will create a collection with the name “postal_synonyms” in the sample_restaurants database and ensure that only documents with correct format are inserted into the collection.
We will use mongoimport to import all the JSON files we created.
In the terminal, navigate to the folder where all the JSON files for postal abbreviation synonyms were created.
Take a look at the synonyms collections you just created in Atlas. You should see around 229 documents there.
Now that we created the synonyms collection in our sample_restaurants database, let’s put it to use.
Let’s start by creating a search index. Navigate to the Search tab in your Atlas cluster and click the “CREATE INDEX” button.
Since the Visual Index builder doesn’t support synonym mappings yet, we will choose JSON Editor and click Next:
In the JSON Editor, pick restaurants collection in the sample_restaurants database and enter the following into the index definition. Here, the source collection name refers to the name of the collection with all the postal abbreviation synonyms, which we named “postal_synonyms.”
We are indexing the restaurants collection and creating a synonym mapping with the name “synonym_mapping” that references the synonyms collection “postal_synonyms.”
Click on Next and then on Create Search Index, and wait for the search index to build.
Once the index is active, we’re ready to test it out.
Choose $search from the list of pipeline stages. The UI gives us a helpful placeholder for the $search command’s arguments.
Let’s look for all restaurants that are located on a boulevard. We will search in the “address.street” field, so the arguments to the $search stage will look like this:
Let’s add a $count stage after the $search stage to see how many restaurants with an address that contains “boulevard” we found: As expected, we found a lot of restaurants with the word “boulevard” in the address. But what if we don’t want to have users type “boulevard” in the search bar? What would happen if we put in “blvd,” for example?
Looks like it found us restaurants with addresses that have “blvd” in them. What about the addresses with “boulevard,” though? Those did not get picked up by the search.
And what if we weren’t sure how to spell “boulevard” and just searched for “boul”? tells us it’s an acceptable abbreviation for boulevard, but our $search finds nothing. This is where our synonyms come in! We need to add a synonyms option to the text operator in the $search command and reference the synonym mapping’s name:
And there you have it! We found all the restaurants on boulevards, regardless of which way the address was abbreviated, all thanks to our synonyms.
Synonyms is just one of many features offers to give you all the necessary search functionality in your application. All of these features are available right now on . We just showed you how to add support for common postal abbreviations to your Atlas Search index—what can you do with Atlas Search next? Try it now on your free-forever cluster and head over to if you have any questions!