Autocomplete operator to search string which contains special characters

Hello there!
I have a string field which uses for autocompletion functionality. And wondering how to implement Autocomplete for values that contains special characters/punctuation, e.g. > 1’ 1/4" - 8". Should I replace or somehow escape these characters ( ', /, " ) when making an autocomplete query? Where I can find information how does the analyzer parse those characters?

My search-index example

{
	"analyzer": "lucene.keyword",
	"mappings": {
		"dynamic": true,
		"fields": {
			"name": [
				{
					"analyzer": "lucene.standard",
					"multi": {
						"keyword": {
							"analyzer": "lucene.keyword",
							"type": "string"
						}
					},
					"type": "string"
				},
				{
					"minGrams": 3,
					"tokenization": "edgeGram",
					"type": "autocomplete"
				}
			],
                }
        }
}

Hi @Nikita_Prokopev and welcome to the MongoDB community forum!!

For better understanding of the requirement, could you help us with a few details like:

  1. A sample document on which the search is to be made.
  2. The search query(s) that you have attempted in order to retrieve the sample document (Please include the search terms / query values).

Let us know if you have any further concerns.

Best Regards
Aasawari

Hi,

  1. Document:
{
    "_id": {
        "$oid": "63de3e354e354cb52430c9b3"
    },
    "sku": "1' 1/2\"w x 18\"d x 10\"h",
    "name": "30\" Farmer Sink, 29 1' 1/2\"w x 18\"d x 10\"h"
}
  1. Search query:
$search: {
  compound: {
    should: [
        {
          autocomplete: {
            query: "29 1 1' 1/2",
            path: 'name',
            tokenOrder: 'sequential',
          }
      },
      {
          autocomplete: {
            query: "29 1 1' 1/2",
            path: 'sku',
            tokenOrder: 'sequential',
          }
      },
    ]
  }
}

So if I use query like

29 1 1

It’s returns my document.

But if I use query which contains ’ or " or /

29 1’ 1
or
29 1’ 1/
or
1/2"
or
29 1 1 2

It’s doesn’t

So how do I need to change my query to be able to find the document which contains ’ " or / symbols.

Thanks!

Hi @Nikita_Prokopev and thank you for sharing the above query and sample documents.

Here is what I tried based on the sample document shared, the lucene.whitespace analyzer used in my example divides text into searchable terms wherever it finds a whitespace character. It leaves all terms in their original case. You may need to adjust your index accordingly & test thoroughly to verify if the following suits your use cases.

Here, is how my index definition looks like:

Index Definition:

{
  "analyzer": "lucene.whitespace",
  "searchAnalyzer": "lucene.whitespace",
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "analyzer": "lucene.whitespace",
          "type": "string"
        },
        {
          "analyzer": "lucene.whitespace",
          "type": "autocomplete"
        }
      ],
      "sku": [
        {
          "analyzer": "lucene.whitespace",
          "type": "string"
        },
        {
          "analyzer": "lucene.whitespace",
          "type": "autocomplete"
        }
      ]
    }
  }
}

And the following query returns the required documents:

[
  {
    '$search': {
      'index': 'default', 
      'compound': {
        'should': [
          {
            'autocomplete': {
              'query': '29 1’ 1/', 
              'path': 'name'
            }
          }, {
            'autocomplete': {
              'query': '29 1’ 1/', 
              'path': 'sku'
            }
          }
        ]
      }
    }
  }
]

Let us know if you have any further questions .

Best Regards
Aasawari

@Aasawari @Nikita_Prokopev , How we search or modify the analyzer to search keywords like - m&s, h&m, h & m, Marks & spencer.

Hi @Utsav_Upadhyay2

This seems to be a different question from the thread.
Could you open a new thread with the relevant information to help you with the possible solution.

Best Regards
Aasawari

I have a similar problem. In my case string contains the “-” character.

Ex. ABC01-2022, ABC22-2021, ABC20-2023

Currently, my search index is configured something like this

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "projectId": [
        {
          "analyzer": "lucene.whitespace",
          "tokenization": "nGram",
          "minGrams": 2,
          "maxGrams": 5,
          "type": "autocomplete"
        }
      ]
    }
  }
}

Search query

{
  index: "default",
  compound: {
    must: [{
      autocomplete: {
        query: "ABC01",
        path: "projectId"
      }
    }]
  }
}

Hey @Darshan_Vesatiya , pardon the delayed reply here.

Have you been able to resolve this issue?