Autocomplete using search doesnt work after 3 spaces

Arbaz_Siddiqui · April 17, 2023, 1:48pm

Here is a sample doc I am trying to search:

{
  "_id": {
    "$oid": "6436a6e365bb0bf723a17a21"
  },
  "name": "Indian Institute of Nursing ",
  "city": "",
  "memberCount": 0,
  "state": "Karnataka"
}

My current index mapping:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "foldDiacritics": true,
          "maxGrams": 30,
          "minGrams": 1,
          "tokenization": "edgeGram",
          "type": "autocomplete"
        },
        {
          "type": "string"
        }
      ]
    }
  }
}

My aggregation for search:

{
  index: "institutesEdge",
  returnStoredSource: true,
  compound: {
    must: [
      {
        autocomplete: {
          path: "name",
          query: "indian institute of",
          tokenOrder: "sequential",
        },
      },
    ],
  },
}

The above runs correctly and show result, but if I change the text from indian institute of to indian institute of nur or indian institute of (space after of), it says no result.
This happens for much smaller search terms as well, it doesnt show any result after third space is entered but works perfectly before that.

I have tried using ngram as well, same issue with that too.

Please suggest if I am doing something wrong?

Junderwood · April 19, 2023, 9:19am

Hi @Arbaz_Siddiqui, and welcome to the forum!

As it looks like your query has multiple word tokens then using a standard ‘text’ search operator could help with relevancy.

Try updating your query like:

{
  index: "institutesEdge",
  returnStoredSource: true,
  compound: {
    should: [
      {
        autocomplete: {
          path: "name",
          query: "indian institute of",
          tokenOrder: "sequential",
        },
      },
    {
       text: {
          path: "name",
          query: "indian institute of",
      }
    }
    ],
  },
}

This way your query can capture use cases where the user has incomplete words by leveraging autocomplete’s partial matching. But in addition the text operator will provide excellent relevancy matching words that have already been completely typed.

For further tuning you could add ‘fuzzy’ options to the ‘text’ operator to capture mispellings.

Arbaz_Siddiqui · April 19, 2023, 6:51pm

Thanks for the reply @Junderwood.

Text operator introduces undesirable results, for ex in the above query indian institute of nur gives the correct result but the same results are also shown when i search for term indian institute of b which should not be the case as it does not exist.

I need to search exact edgeGram on strings with phrases i.e. strings with few spaces.

Junderwood · April 21, 2023, 9:26am

Hi @Arbaz_Siddiqui

There does seem to be some limitation with how many spaces autocomplete can handle. You can solve this by using a wildcard query. But since these can be slow you may want to add some client logic to only use wildcard if the query string contains more than 2 spaces - this should be a pretty fast check.

There are likely a few different ways to approach this though!

I ran some tests with data similar to yours and it seemed to work pretty well.

Documentation to reference:

Then you can add this to your index definition:

mappings: {
    dynamic: false,
    fields: {
        name: [
            {
                "foldDiacritics": true,
                "maxGrams": 30,
                "minGrams": 2,
                "tokenization": "edgeGram",
                "type": "autocomplete"
            },
            {
                analyzer: "keywordlowercase",
                "type": "string"
            }
        ]
    }
  },
  analyzers:[
    {
      "charFilters": [],
      "name": "keywordlowercase",
      "tokenFilters": [
        {
          "type": "lowercase"
        }
      ],
      "tokenizer": {
        "type": "keyword"
      }
    }
  ]

And your query would look like (note that you append “*” to the end of the input query string):

{
      $search: {
        index: 'default',
        compound: {
            should: [
                {
                    autocomplete: {
                        path: "name",
                        query: "indian institute of t",
                        tokenOrder: "sequential",
                    },
                },
                //optional clause that is only added if number of spaces > 2
                {
                    wildcard: {
                        path: "name",
                        query: "indian institute of t*",
                        allowAnalyzedField: true,
                    }
                }
            ],
        },
      }
    }