Hi,
I am teviving this old topic with the hope to get an answer to the presumably same issue but on Mac OSX.
I am currently testing the entire Unicode character set and have noticed that StringEscapeUtils.unescapeJava
doesn’t convert all characters properly. I doubt it’s a font issue as the characters extracted from the Unicode namelists such as this are displayed correctly. The source is UTF-8 encoded which likely eliminates an encoding mismatch.
In a very old external post, which might not be valid anymore, it was once stated:
With ICU4J you can use
com.ibm.icu.impl.Utility.unescape(String s)
to convert the literal string to utf8 string. However, java string internally doesn’t use utf8 encoding, instead it uses UTF-16 (Big Endian) to present unicode characters. To fully convert the string from utf8 literal to java unicode representation, you need to decode it with ISO-8859-1 then read the bytes back to string using encoding UTF-8.
I wonder if you have an idea @mlauber71 as you are one of the most skilled cracks around
Cheers
Mike