Fuzzy Search Problems

Paul_Cormier · March 21, 2024, 6:24pm

I have a fairly simple “fuzzy” “person name” search configured and I’m somewhat baffled by the results. The collection has separate first_name & last_name fields, and contains ~200K records.

I’m finding if there’s a variation “within” the name (e.g. Mary => Mara) the scoring (sorting) makes sense. However, if the variation extends PAST the submitted name it’s scored VERY poorly.

My problem example: Searching for “Mary Smith”.

There’s a record with first_name: “MaryX”, last_name: “SmithX”. That record appears waaaayyy lower in the scoring than all kinds of names that don’t appear to be close at all, like:

“Gary Calvert”
“Marc Mercier”
“Dary Estevez”
“James Mabry”
“Saith Ruiz”
etc.

Out of 1,875 total results, “MaryX SmithX” appears in position 1, 812 with a score of 1.78989.

The “Top 3” results, with a score of 4.95143 were:
“Smith Kraai”
“smith phengpaseuth”
“Smith Brandon”
hunh?!

My query is as follows:

$search: {
    index: "idx_search_user",
    scoreDetails: true,
    text: {
        query: "Mary Smith",
        path: ["last_name", "first_name"],
        fuzzy: {
            maxEdits: 1,
            prefixLength: 0,
            maxExpansions: 10
        }
    }
}

I then tried implementing a compound search with “must”, and it doesn’t return “MaryX SmithX” at all?!

$search: {
    index: "idx_search_user",
    scoreDetails: true,
    compound: {
        must: [
            {
                text: {
                    query: "Mary Smith",
                    path: ["last_name", "first_name"],
                    fuzzy: {
                        maxEdits: 1,
                        prefixLength: 0,
                        maxExpansions: 10
                    }
                }
            }
        ]
    }
}

Adding prefixLength: 1 made no difference.

My index looks like this:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "first_name": [
        {
          "type": "string"
        },
        {
          "type": "token"
        }
      ],
      "last_name": [
        {
          "type": "string"
        },
        {
          "type": "token"
        }
      ]
    }
  }
}

It feels like I’m missing something. Does anyone have any ideas how to make this work?