Inaccurate score for autocomplete search

Nicolas_Guilitte · June 27, 2023, 12:45pm

I have a collection containing a field “token” (among others).
I defined a new index on that single field, with data type “autocomplete”.
Then I made a very basic search query, in order to search on that field “token”, using the index. The query looks like this :

    "pipeline": [
        {
            "$search": {
                "index": "index-autocomplete",
                "autocomplete": {
                    "path": "token",
                    "query": "John"
                }
            }
        },
        {
            "$project": {
                "_id": 0,
                "token": 1,
                "Score": { "$meta": "searchScore"}
            }
        },
        {
            "$sort": { "score": -1}
        }
    ]

I receive the following results:

{
            "token": "Johnson - Smith",
            "Score": 4.894947052001953
        },
        {
            "token": "Johnson",
            "Score": 4.348791122436523
        },

As you can see, the score of the first result is higher. However, as the field “token” of the second result has less characters, I expected that it would have a higher score than the first one, but it’s not the case.

In MongoDB documentation, I saw that the score with autocomplete was not always accurate. So I already tried the following workarounds:

Add the “phrase” operator (with score boost), in addition of the “autocomplete” operator, as described here or here.
Define the field “token” also as type “string” in the index (in addition of the type “autocomplete”), and then use the “text” operator in the query, in addition of the “autocomplete” operator.
Change the Analyzer at the level of the index (example : use lucene.keyword instead of the lucene.standard).

If I use “Johnson” as input string, then only the second workaround works (I have a score of 10,5 for token “Johnson”, which is an exact match, and a score of 9,6 for token “Johnson – Smith”).

But when I use “John” as input string, as it’s always a partial match, none of the workaround works, and I have always a higher score for token “Johnson – Smith”.

Is there a way to change that behavior ?

Jason_Tran · July 4, 2023, 11:09pm

Hi @Nicolas_Guilitte,

I believe the scoring behaviour you’ve mentioned is described as per the score portion of the autocomplete documentation:

autocomplete offers less fidelity in score in exchange for faster query execution.

The workarounds mentioned have to do with exact matches (i.e. Using "Johnson" as the search term rather than "John").

Would perhaps altering the score option in the autocomplete suit your use case? One example could be to set it to a constant value so that they have the same score. There is also a function examples which may possibly help depending on the use case and other fields in the document(s) being searched.

Regards,
Jason

system · November 11, 2023, 10:46pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.