Ignore capitalization and diacritics (e.g. localeCompare in JS)

Hi. I’m looking for Mongo functionality to search documents based on a string that will match any variation of that string ignoring capitalization and diacritics. I am using the findOne Mongo command.

JavaScript has this functionality with localeCompare: String.prototype.localeCompare() - JavaScript | MDN.

For example I might search on “Hello World”, and the documents it should match might contain:

[
	"Hello World",
	"hello world",
	"HELLO WORLD",
	"héllö wörld"
]

To specify my use-case: I have documents with an array of names (1 or more) in each of them. Sometimes the name is written a bit differently, e.g. with an accent on one of the letters or slightly different capitalization. I will not use this functionality for long texts.

I am using the Node.js Mongo client.

this.collection.findOne({ names: name })

Hi @Carsten, I think the features you’re looking for can be found within the $text operator. You would need to create a text index on the field you’re attempting to query through.

According to the documentation here, you can use the $text operator on the following environments:

  • MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud

  • MongoDB Enterprise: The subscription-based, self-managed version of MongoDB

  • MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB

The $text operator uses the following syntax:

{
  $text: {
    $search: <string>,
    $language: <string>,
    $caseSensitive: <boolean>,
    $diacriticSensitive: <boolean>
  }
}

It allows passing arguments for both your ignoring case sensitivity and diacritics.

You can find documentation on the diacritic insensitivity option here.
A caveat:

  • The version 3 text index is diacritic insensitive. That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é, ê, and e.

Once the text index is created (at least version 3) on the field, using the $text operator will be diacritic and case insensitive unless otherwise specified.

3 Likes

For anyone reading this: I ended up using the collation function. I’m using the Node MongoDB client so I only needed to add a small part to my query:

return await this.collection.find(query, { collation, projection }).toArray();

Note I can no longer use findOne.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.