Mongodb Turkish character set problem

I am insert data

(name:'İstanbul') <-- problem big İ
filter: { 'Adi' :/is/i} { locale: 'tr', strength: 1 }

result: not found…
how can I solve the problem?

Hi :wave: @suleyman_yalcin,

Welcome to the MongoDB Community forums :sparkles:

If you refer to the MongoDB documentation for $regex here

It states:

Case-insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.

So, in this case, I’ll suggest you use MongoDB Atlas Search.

For example, consider the following sample data collection called names:

_id: ObjectId('6350d8ccb0ec2b79cdd7576c')
name: "İstanbul"

You can create an Atlas search index from the MongoDB Atlas dashboard:

The index will look something like this in JSON format:

{
  "analyzer": "lucene.turkish",
  "mappings": {
    "dynamic": true,
    "fields": {
      "name": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "type": "autocomplete"
        }
      ]
    }
  }
}

After that, you can run the query using $search:

db.names.aggregate([
     {
    $search: {
        index: 'name_index',
        autocomplete: {
            query: 'is',
            path: 'name'
        }
    }
}
 ])

It will return the output as follows:

Output:
[ { _id: ObjectId("6350d8ccb0ec2b79cdd7576c"), name: 'İstanbul' } ]

For more info refer to the Atlas search documentation here

I hope it helps!

Thanks,
Kushagra

2 Likes

I solved this problem by tweaking the regex patterns a bit. I hope it helps you too

let text = searchText;
                    // Replace special characters
                    text = text.replace(/[-\/\\^$*+?.()|[\]{}]/g, '')
                    let array = text.split('')
                    let newArray = array.map((char: any) => {
                        if (char === 'i' || char === 'I' || char === 'ı' || char === 'İ' || char === 'İ') {
                            char = '(ı|i|İ|I|İ)'
                            return char
                        }
                        else if (char === 'g' || char === 'G' || char === 'ğ' || char === 'Ğ') {
                            char = '(ğ|g|Ğ|G)'
                            return char
                        }
                        else if (char === 'u' || char === 'U' || char === 'ü' || char === 'Ü') {
                            char = '(ü|u|Ü|U)'
                            return char
                        }
                        else if (char === 's' || char === 'S' || char === 'ş' || char === 'Ş') {
                            char = '(ş|s|Ş|S)'
                            return char
                        }
                        else if (char === 'o' || char === 'O' || char === 'ö' || char === 'Ö') {
                            char = '(ö|o|Ö|O)'
                            return char
                        }
                        else if (char === 'c' || char === 'C' || char === 'ç' || char === 'Ç') {
                            char = '(ç|c|Ç|C)'
                            return char
                        }
                        else {
                            return char
                        }
                    })

                    // Array values are joined with no spaces
                    text = newArray.join('')
filename: new RegExp('(.*)' + text + '(.*)', "ig")

When search İstanbul its converted to /(.*)(ı|i|İ|I|İ)(ş|s|Ş|S)tanb(ü|u|Ü|U)l(.*)/gi

This result worked great for me.

3 Likes