I have a collection containing a field “token” (among others).
I defined a new index on that single field, with data type “autocomplete”.
Then I made a very basic search query, in order to search on that field “token”, using the index. The query looks like this :
"pipeline": [
{
"$search": {
"index": "index-autocomplete",
"autocomplete": {
"path": "token",
"query": "John"
}
}
},
{
"$project": {
"_id": 0,
"token": 1,
"Score": { "$meta": "searchScore"}
}
},
{
"$sort": { "score": -1}
}
]
I receive the following results:
{
"token": "Johnson - Smith",
"Score": 4.894947052001953
},
{
"token": "Johnson",
"Score": 4.348791122436523
},
As you can see, the score of the first result is higher. However, as the field “token” of the second result has less characters, I expected that it would have a higher score than the first one, but it’s not the case.
In MongoDB documentation, I saw that the score with autocomplete was not always accurate. So I already tried the following workarounds:
- Add the “phrase” operator (with score boost), in addition of the “autocomplete” operator, as described here or here.
- Define the field “token” also as type “string” in the index (in addition of the type “autocomplete”), and then use the “text” operator in the query, in addition of the “autocomplete” operator.
- Change the Analyzer at the level of the index (example : use lucene.keyword instead of the lucene.standard).
If I use “Johnson” as input string, then only the second workaround works (I have a score of 10,5 for token “Johnson”, which is an exact match, and a score of 9,6 for token “Johnson – Smith”).
But when I use “John” as input string, as it’s always a partial match, none of the workaround works, and I have always a higher score for token “Johnson – Smith”.
Is there a way to change that behavior ?