Questions about the text operator for Atlas search - how to build an effective search for usernames?

Alex_Bjorlig · August 29, 2021, 7:04pm

Hi fellow users. My objective is to build an effective $search pipeline, for a collection with usernames, but I have so many questions. I have watched videos and read documentation, but if anyone could help me out here it would be awesome. I’m simply looking for a Atlas search setup, where users will get expected results. But with the following;

Data

[{name:  "John Doe"}, {"John Eriksen", {"Lara Croft"} ]

Index

{ "mappings": { "dynamic": true } }

Pipeline

[
  {
    '$search': {
      'text': {
        'query': 'John Doe', 
        'path': 'name'
      }
    }
  }, {
    '$addFields': {
      'score': {
        '$meta': 'searchScore'
      }
    }
  }
]

I get the following results

John Doe → 0.66
John Eriksen" → 0.21

Questions

Why is “John Eriksen” part of the result set at all - when the user did a query for John Doe? I’t would make sense if searching for “John” only, but when we have an exact match why is John Eriksen even there?
Why is the score of “John Doe” not 1? It’s an exact match?
Now that “John Eriksen” is part of the result set, why is the score so relatively high?

Thanks for the help - really trying to build some great UX for the end users here

Marcus · September 2, 2021, 9:56pm

The text operator considers all terms in a query individually. If you want matches for only John Doe, you have two options, phrase or space delimited terms. It sounds like you are looking for a phrase query:

{
    '$search': {
      'phrase': {
        'query': 'John Doe', 
        'path': 'name'
      }
    }
  }

Let me know if that solves your issue.

Alex_Bjorlig · September 3, 2021, 8:06am

Hi @Marcus - thanks for taking your time to answer. Really trying to build what I would think is basic functionality, but really hard to find good examples on.

The phrase operator is somehow better, but not optimal (in it’s raw form at least).

If we now have the following docs:

[{name: "Jonas Jespersen"}, {name:  "John Doe"}, {"John Eriksen", {"Lara Croft"} ]

and pipeline:

[
  {
    '$search': {
      'phrase': {
        'query': 'Jonas Jes', 
        'path': 'name'
      }
    }
  }
]

I get 0 results - not very good for a username search where there is pretty much a complete match

Marcus · September 3, 2021, 7:06pm

a query for “Jonas Jes” is actually a partial match and different from your first question. For partial match, you need something autocomplete or regex, which is slower than autocomplete. Phrase will work if it is an exact match of “Jonas Jespersen.” I hope that makes sense.

Alex_Bjorlig · September 3, 2021, 7:16pm

Thanks again for answering @Marcus - seems like you know MongoDb full text search pretty well. What would you recommend for an effective username search pipeline?

Marcus · September 3, 2021, 9:12pm

I’m a Product Manager for the product, but I don’t know anything. Permanent learner.

As for the pipeline, that depends on your use case. My simple recommendation would be to create an index with autocomplete on username, with minGram of 2 and maxGram of 7 for a small dataset. If the collection is >500,000 documents maybe make minGram larger or prepare to pay for beefy boxes.

The autocomplete query operator is straightforward to use, also. However, if you need to support diacritics in username like ö or å it won’t work well for that use case for a few more weeks. If this works for you let me know / accept the answer. If not, someone will likely follow up from the community or from the company.

Alex_Bjorlig · September 3, 2021, 11:10pm

The collection is way below 500.000 docs, so I will look into autocomplete.

We have usernames from around the world, so looking very much forward to ö, æ suppport etc

Would be really awesome with a deep-dive article/blogpost on making user search; maybe a simple example where you start searching for username and then expand search to username + email

Marcus · September 4, 2021, 7:18am

What a great idea Alex! I will talk with the team and see if we can schedule something like that. As you know, there is a lot going on. We have not done a ton of blog posts for Atlas Search that are so use case specific. If we have the bandwidth to do it, we will do it.

We actually had another couple customers ask for guidance around building a feature like this for their apps about 6 and 12 months back. Thanks. Please consider to either accept the initial answer, the follow up answer, or clarify the scope of the question. I hope that other users can look at this conversation and derive some actionable guidance.

Marcus · September 4, 2021, 7:18am

We have usernames from around the world, so looking very much forward to ö, æ suppport etc

I will let you know when we release this feature.

Anthony_Comito · September 9, 2021, 8:26pm

I’m also wishing there was more material on $search

Alex_Bjorlig · November 29, 2021, 10:30am

Any news here @Marcus ?

Marcus · December 3, 2021, 9:16pm

Hi again Alex. We just recently released the ability to specify an analyzer for autocomplete. For example, if most of your users had the two characters you mentioned, you might configure the autocomplete field to use lucene.danish.

Alex_Bjorlig · December 4, 2021, 9:25am

Hi @Marcus - thanks! There are so many options for autocomplete; so I’m still hoping we can get a guide

A user search example that:

Makes it possible to search for email or name
Supports users from around the world, including south america and scandinavia
When users type a query that mathces exact on username, the query understands both first and lastname of users (i’m just trying to say that the results should match well )

Hope someone at Mongodb could write such a blog, or maybe include as part of this presentation

Alex_Bjorlig · March 31, 2022, 1:01pm

Hi @Marcus . Do you know if there are any news on this? We still have a bad user search experience, and would love to know if someone published a better guide/blog from MongoDB?

tapiocaPENGUIN · March 31, 2022, 6:39pm

I haven’t done anything from an application but from the mongo shell there is a solution to get what you need and I’m sure it can be translated easy to a code usage. Although it might not scale well, so depending on indexing and the collection size it might be slower than any SLA you have. But It might be worth some trial and error to see if you can find a solution.

But in the examples below it does a partial search, and if it finds an exact match it only returns the exact match. Also it handles ö and other characters.

Alex_Bjorlig · April 7, 2022, 7:01pm

Thanks for tipping it. But this approach does not give you search highlights, scores, typos-safety, ö / o substation, etc.