How to correctly set diacritic insensitive $text index for spanish lang

Hi,

I’m struggling to find out how to correctly set a diacritic insensitive text index for my collection of persons. It’s a normal collection without collation.

The MongoDB version is 5.0.15

I need a text index (not using mongo atlas) for the name and familyName fields. I created an index with this config:

{
  "v": 2,
  "key": {
    "_fts": "text",
    "_ftsx": 1
  },
  "name": "personsFullname",
  "weights": {
    "familyName": 1,
    "name": 1
  },
  "default_language": "es",
  "language_override": "language",
  "textIndexVersion": 3
}

The problem is that even though the MongoDB manual says that from version 3 the text search is diacritic insensitive it doesn’t work that way.

Suppose I have these 3 records:

[
  {
    "_id": "aaaaaaa",
    "name": "Roberto ",
    "familyName": "Torres García "
  },
  {
    "_id": "bbbbbbb",
    "name": "Ruben A",
    "familyName": "Parras García"
  },
  {
    _id:"ccccc",
    "name": "Karla",
    "familyName": "Rosas García"
  }
]

If I search for García (using diacritic for i):

db.getCollection("personsData").find({ "$text": { "$search": "García" } })

It finds the 3 records.

But if I search for Garcia (Not using diacritic for i):

db.getCollection("personsData").find({ "$text": { "$search": "Garcia" } })

It finds no records.

What am I missing here?

Thank you in advance.

1 Like

I’m not sure if the version refers too the prop "v": 2, or "textIndexVersion": 3.
Any help or hint is pretty much appreciated.

I made this last year for a Realm customer who needed to route Mandarin, it should “just work” with Spanish. I know this works with Chinese in making things diacritic, and Korean too, but do let me know if this also helps you for your Spanish. I literally just changed the items to es in default_language etc. for Spanish.

db.persons.createIndex(
  { name: "text", familyName: "text" },
  { default_language: "es",
    language_override: "language",
    textIndexVersion: 3,
    collation: { locale: "es", strength: 2 }
  }
);

This is what you’d use to query your collection.

db.persons.find(
  { $text: { $search: "SubjectorName" } },
  { score: { $meta: "textScore" } }
).collation({ locale: "es", strength: 2 })

Oh, @Ricardo_Montoya This will help you do this in Realm, too. Just change the language to Spanish.

exports.searchArticles = function(searchTerm) {
  const articlesCollection = context.services.get("mongodb-atlas").db("mydb").collection("articles");
  return articlesCollection.find({$text: {$search: searchTerm, $language: "zh"}});
};

@Ricardo_Montoya

I forgot to also add the Realm aggregation for it, too.

This is what you can use in a Realm app if you’re using one of those, also, just again, change the language to Spanish. It also has error handling so you will get an error response if something isn’t quite right. (I like my error handlers when I use Realm lol)

exports.searchArticles = function(searchTerm) {
  const articlesCollection = context.services.get("mongodb-atlas").db("mydb").collection("articles");
  const pipeline = [
    {
      $match: {
        $text: {
          $search: searchTerm,
          $language: "zh"
        }
      }
    },
    {
      $project: {
        title: 1,
        author: 1,
        publicationDate: 1
      }
    },
    {
      $sort: {
        publicationDate: -1
      }
    }
  ];
  
  try {
    return articlesCollection.aggregate(pipeline).toArray();
  } catch (error) {
    console.error("Error executing searchArticles pipeline:", error);
    throw new Error("An error occurred while searching for articles.");
  }
};

Thank you.

Unfortunately I get and error when I try to execute de index creation.

MongoServerError: Error in specification { default_language: "es", language_override: "language", textIndexVersion: 3, key: { name: "text", familyName: "text" }, name: "name_text_familyName_text", v: 2, collation: { locale: "es", caseLevel: false, caseFirst: "off", strength: 2, numericOrdering: false, alternate: "non-ignorable", maxVariable: "punct", normalization: false, backwards: false, version: "57.1" } } :: caused by :: Index type 'text' does not support collation: { locale: "es", caseLevel: false, caseFirst: "off", strength: 2, numericOrdering: false, alternate: "non-ignorable", maxVariable: "punct", normalization: false, backwards: false, version: "57.1" }

The summary is Index type 'text' does not support collation

:zipper_mouth_face:

What version of MDB are you using?

Ok, so oddly enough the collation works on Alibaba… I don’t know why, but removing the collation then fixes the error.

This is working on my local 6.0

{
  "name": "name_text_familyName_text",
  "key": {
    "name": "text",
    "familyName": "text"
  },
  "default_language": "es",
  "language_override": "language",
  "textIndexVersion": 3
}

@Ricardo_Montoya

Need to go through and remove the collations.

Following this thread from StackOverflow

  • If I test locally, on mongo version 6.0.5 it works creating the index in this way (not setting _ default_language: “spanish”):
db.consultas.createIndex(
  { diagnostico: "text" },
);

Is this a known bug on MongoDB or am I doing something wrong ?

On the online mongoPlayground, works:

The playground link: Mongo playground

And if you want to reproduce the example in a local instance (I’m using docker 6.0.5):

use("clinica");

db.consultas.insertMany([
  {
    nombre: "Juan Perez",
    especialidad: "general",
    diagnostico: "Dolor abdominal, Fiebre alta, tos, posible caso de COVID",
  },
  {
    nombre: "María Pelaez",
    especialidad: "general",
    diagnostico: "Tensión alta, posible episodio de ataque de ansiedad",
  },
  {
    nombre: "Javier Garcia",
    especialidad: "cardiología",
    diagnostico: "Arritmias, acompañado de tensión alta, enfermería",
  },
  {
    nombre: "Manuel Gómez",
    especialidad: "general",
    diagnostico: "Fiebre alta, tos y mucosidades, enfermería",
  },
]);

Creating the index

db.consultas.createIndex(
  { diagnostico: "text" },
  { defaultLanguage: "es"}
);

And launching the query (you can try both options enfermería and enfermeria you get results

db.consultas.find({ $text: { $search: "enfermeria" } });

I didn’t need to go for the ellaborated version

I read on other posts to try

THIS SEEMS NOT TO BE NEEDED ON VERSION 6

db.consultas.createIndex(
  { diagnostico: "text" },
  {
    defaultLanguage: "es",
    textIndexVersion: 3,
  }
);

And in the query indicate to ignore diacritics:

db.consultas.find({
  $text: {
    $search: "enfermeria",
    $diacriticSensitive: false,
  },
});