Diacritics not displaying correctly

Matthew_Andersen · April 26, 2023, 9:20pm

I have an odd issue - characters that would normally have Spanish-language diacritics are displayed as question marks. I need to be able to search the whole collection for incorrect characters and fix them. Just wondering if anyone has run in to any similar issues before. I’m thinking it is actually an issue with the source file, but I haven’t been able to verify that yet.

Aasawari · April 28, 2023, 4:22am

Hi @Matthew_Andersen and welcome to MongoDB community forums!!

Can you confirm if my understanding is correct here saying that, you are not able to search Spanish diacritics and it results the response with ? as the response?
Can you also confirm is this is related to MongoDB Atlas Search language Analysers?

If yes, you can take a look at the example code from MongoDB language Analyser documentation, since Spanish is a supported language analyser, the expectations is to get correct response.

However, to understand further, could you help me in understanding the requirement better by proving the below information.

A sample document from the collection.
The search index defined
The search query you have tried.
The output you are expecting.

Could you please assist me in understanding what is meant by incorrect characters in the above statement?

Regards
Aasawari

Matthew_Andersen · April 30, 2023, 1:31pm

Hi Aasawari! I appreciate the response. The issue isn’t with being able to search Spanish-language characters - the accented characters are displaying as the unicode unknown character of a question mark inside a diamond. I was able to do a find on those by searching for “new RegExp(‘\ufffd’)” in the impacted fields, then did an updateMany coupled with a $replaceAll to correct the character. I believe the problem is with the data ingestion and I’ll have to track that down separately.