I have a fairly simple “fuzzy” “person name” search configured and I’m somewhat baffled by the results. The collection has separate first_name & last_name fields, and contains ~200K records.
I’m finding if there’s a variation “within” the name (e.g. Mary => Mara) the scoring (sorting) makes sense. However, if the variation extends PAST the submitted name it’s scored VERY poorly.
My problem example: Searching for “Mary Smith”.
There’s a record with first_name: “MaryX”, last_name: “SmithX”. That record appears waaaayyy lower in the scoring than all kinds of names that don’t appear to be close at all, like:
“Gary Calvert”
“Marc Mercier”
“Dary Estevez”
“James Mabry”
“Saith Ruiz”
etc.
Out of 1,875 total results, “MaryX SmithX” appears in position 1, 812 with a score of 1.78989.
The “Top 3” results, with a score of 4.95143 were:
“Smith Kraai”
“smith phengpaseuth”
“Smith Brandon”
hunh?!
My query is as follows:
$search: {
index: "idx_search_user",
scoreDetails: true,
text: {
query: "Mary Smith",
path: ["last_name", "first_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 0,
maxExpansions: 10
}
}
}
I then tried implementing a compound search with “must”, and it doesn’t return “MaryX SmithX” at all?!
$search: {
index: "idx_search_user",
scoreDetails: true,
compound: {
must: [
{
text: {
query: "Mary Smith",
path: ["last_name", "first_name"],
fuzzy: {
maxEdits: 1,
prefixLength: 0,
maxExpansions: 10
}
}
}
]
}
}
Adding prefixLength: 1
made no difference.
My index looks like this:
{
"mappings": {
"dynamic": false,
"fields": {
"first_name": [
{
"type": "string"
},
{
"type": "token"
}
],
"last_name": [
{
"type": "string"
},
{
"type": "token"
}
]
}
}
}
It feels like I’m missing something. Does anyone have any ideas how to make this work?