Returning results for email address queries in desired order

Hi all! :wave: So, I’ve built a search tool for our support team using Atlas Search. It works well overall but they have one improvement request that I haven’t quite been able to figure out.

Tl;Dr - I’m trying to return a list of documents via full email address query, in the order of:

  1. Exact match
  2. Domain match (ordered by closest matching username)
  3. Username match (ordered by closest matching domain name)

For this scenario, I’m focusing on text search, but I believe I’ll need to solve this for autocomplete at some point too.

Example data:

  • mick@domain.com
  • tom@domain.com
  • anna@domain.com
  • sofia@domain.com
  • mick@example.com
  • mick@anotherexample.com
  • …. + millions of other documents

Example use case
When searching for mick@domain.com the results should be in a similar order to above, where the first result is an exact match, and the remaining results are ordered based on matching domain, followed by matching username.

Actual Behaviour
This varies depending on how the indexes and analyzers are set up. With the email field mapped as a String, and using the standard analyzer, the results would be returned like so:

  • mick@domain.com
  • mick@example.com
  • mick@anotherexample.com
  • tom@domain.com
  • anna@domain.com
  • sofia@domain.com

This makes sense, since we are matching left to right. Any results beginning with mick@ seem to score higher than results that match the domain, even if they are a closer match to the full search query.

For autocomplete I could use a right edgeGram to match from right to left. This doesn’t seem possible to use for a text search, though. Even if it was possible, it introduces a new problem where the results will be order by domain match (which is desired) but the top result won’t necessarily by mick@domain.com, it could be any one of the @domain.com matches.

Lastly, I’ve tried using the uaxUrlEmail tokenizer and this does not give the desired result and only seems to return an exact match, but we’d like to include other similar matches up to our limit of 25 results per page.

Appreciate any tips or advice on getting the results we desire. Our current version in production is using a basic index using standard analyzers:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "email": [
        {
          "type": "autocomplete"
        },
        {
          "type": "string"
        }
      ]
    }
  }
}

A few tips that may hep here: consider breaking down the e-mail address into separate multi-analyzed fields. Using a pattern filter you can separate out the text before the @-sign and after, and also separate the domain-name from the TLD with patterns - each on a separate multi. Then use compound to craft a weighted (using the score boosting option per clause) set of clauses that bring the relevancy where you need it. A little fiddly, but will give much better results than just a single autocomplete type.

That’s a great suggestion @Erik_Hatcher! I’ll try that out and see how it goes. It sounds like it’ll bring me closer to my desired results :+1: Thanks for taking the time to suggest a solution!

Hey @Erik_Hatcher I was wondering if you might have an example of the kind of pattern filter you mentioned? Would there be something in the docs I could look at? I couldn’t quite find an example that helped

@Mick_Mahady apologies for the delay, I missed your reply. My “pattern filter” reference was confusing, admittedly. There’s https://www.mongodb.com/docs/atlas/atlas-search/analyzers/tokenizers/#regexcapturegroup and https://www.mongodb.com/docs/atlas/atlas-search/analyzers/tokenizers/#regexsplit

1 Like

Thanks, @Erik_Hatcher! I’ll check out those docs. Appreciate the reply :raised_hands: