Influence Search Result Ranking with Function Scores in Atlas Search
Rate this tutorial
When it comes to natural language searching, it's useful to know how the order of the results for a query were determined. Exact matches might be obvious, but what about situations where not all the results were exact matches due to a fuzzy parameter, the
$nearoperator, or something else?
This is where the document score becomes relevant.
Every document returned by a
$searchquery in MongoDB Atlas Search is assigned a score based on relevance, and the documents included in a result set are returned in order from highest score to lowest.
You can choose to rely on the that Atlas Search determines based on the query operators, or you can customize its behavior using function scoring and optimize it towards your needs. In this tutorial, we're going to see how the
functionoption in Atlas Search can be used to rank results in an example.
Let's say that you have a review system like Yelp where the user needs to provide some search criteria such as the type of food they want to eat. By default, you're probably going to get results based on relevance to your search term as well as the location that you defined. In the examples below, I’m using the available in MongoDB Atlas.
$searchquery (expressed as an aggregation pipeline) to make this search happen in MongoDB might look like the following:
The above query is a two-stage aggregation pipeline in MongoDB. The first stage is searching for "korean" in the "cuisine" document path. A fuzzy factor is applied to the search so spelling mistakes are allowed. The document results from the first stage might be quite large, so in the second stage, we're specifying which fields to return for every document. This includes a search score that is not part of the original document, but part of the search results.
As a result, you might end up with the following results:
The default ordering of the documents returned is based on the
scorevalue in descending order. The higher the score, the closer your match.
It's very unlikely that you're going to want to eat at the restaurants that have a rating below your threshold, even if they match your search term and are within the search location. With the
functionoption, we can assign a point system to the rating and perform some arithmetic to give better rated restaurants a boost in your results.
Let's modify the search query to look like the following:
In the above two-stage aggregation pipeline, the part to pay attention to is the following:
What we're saying in this part of the
$searchquery is that we want to take the relevance score that we had already seen in the previous example and multiply it by whatever value is in the
ratingfield of the document. This means that the score will potentially be higher if the rating of the restaurant is higher. If the restaurant does not have a rating, then we use a default multiplier value of 1.
If we run this query on the same data as before, we might now get results that look like this:
So now, while "Korean BBQ Restaurant" might be further in terms of location, it appears higher in our result set because the rating of the restaurant is higher.
Increasing the score based on rating is just one example. Another scenario could be to give search result priority to restaurants that are sponsors. A
functionmultiplier could be used based on the sponsorship level.
Let's look at a different use case. Say you have an e-commerce website that is running a sale. To push search products that are on sale higher in the list than items that are not on sale, you might use a
constantscore in combination with a relevancy score.
An aggregation that supports the above example might look like the following:
To get into the nitty gritty of the above two-stage pipeline, the first stage uses the for searching. We're saying that the search results
mustsatisfy "bose headphones" and if the result-set
shouldcontain "July4Sale" in the
promotionspath, then add a
constantof one to the score for that particular result item to boost its ranking.
shouldoperator doesn't require its contents to be satisfied, so you could end up with headphone results that are not part of the "July4Sale." Those result items just won't have their score increased by any value, and therefore would show up lower down in the list. The second stage of the pipeline just defines which fields should exist in the response.
Being able to customize how search result sets are scored can help you deliver more relevant content to your users. While we looked at a couple examples around the
functionoption with the
multiplyoperator, there are other ways you can use function scoring, like replacing the value of a missing field with a constant value or boosting the results of documents with search terms found in a specific path. You can find more information in the .