UTF-8 characters not rendering in table view

mwiegand · February 13, 2023, 11:42am

Hi,

I am teviving this old topic with the hope to get an answer to the presumably same issue but on Mac OSX.

I am currently testing the entire Unicode character set and have noticed that StringEscapeUtils.unescapeJava doesn’t convert all characters properly. I doubt it’s a font issue as the characters extracted from the Unicode namelists such as this are displayed correctly. The source is UTF-8 encoded which likely eliminates an encoding mismatch.

In a very old external post, which might not be valid anymore, it was once stated:

With ICU4J you can use com.ibm.icu.impl.Utility.unescape(String s) to convert the literal string to utf8 string. However, java string internally doesn’t use utf8 encoding, instead it uses UTF-16 (Big Endian) to present unicode characters. To fully convert the string from utf8 literal to java unicode representation, you need to decode it with ISO-8859-1 then read the bytes back to string using encoding UTF-8.

I wonder if you have an idea @mlauber71 as you are one of the most skilled cracks around

Cheers
Mike