KNIME 5.3 fails to properly display non-default encoding formatted columns:
load a csv file with a non-UTF-8 format, eg UTF-16 or cp1252
select the encoding in the csv reader;
open string manipulation, double click on a column
→ you will see weird boxes and caracters, which represent the column name. You can manipulate the column but only as long as you double-click the name. Typing the name won’t work, as it appears to be written in default locale.
Moreover, under the hood KNIME appears to handle these columns literally as originally encoded, that is as if they were different from their UTF-8 counterparts. For example, if you decide to convert the encoding of the files outside of KNIME, then reopen them in KNIME, you will see that your workflow will no longer work because it can’t see the weird column names from before anymore.
Relevant information: UTF-8 set as default in KNIME, installed MacOS Ventura
Expected behaviour would be that KNIME detects the encoding and transforms it under the hood, so that the workflow wouldn’t be affected.
can you share a screenshot of the issue and your workflow please? I recently experienced something similar but while processing Arabic text when helping someone in the forum.
I ignored it as I thought it‘s related to the Arabic characters but you post made me curious …
Thank you for your sharing of a similar experience.
Sorry, I can’t share any screenshot anymore because I’ve noticed this in a private gig, not a business-related task, and I’ve already converted the input files to UTF-8 to work around the issue.
It does happen with non-ASCII characters in files encoded in another locale than the default one. For example, UTF-16 was enough to cause issues with UTF-8 as default. So was obviously cp1252.
What you describe corroborates the fact that there is an issue related to how this encoding is handled in the KNIME interface and back-end.
As a matter of fact, once the input file is read in the source encoding (as specified in the reader node), it should be automatically converted to KNIME’s default encoding to avoid any issues of the kind as described here above. I suspect that this conversion is not currently performed.
I tried to replicate what you experienced by creatign some test data, reading and writing it with different encodings but failed to spot any apparent issue.
Can you consolidate and your explanation once more in bullets please?
That Arabic text would be helpful. It is really about the encoding of the input file. So, that Arabic text file would have to be encoded in something else than UTF-8 or, assumably something else than KNIME’s default locale. It does not matter whether or not there are diacritics or special symbols. In what I have observed the issue arises with practically every letter.
Column Filter showed the column names properly, nevertheless, then I converted the file encoding manually to UTF-8 (my KNIME’s default) and the said node did not recognise the pre-configured column names anymore.
String Manipulation showed the behaviour described in my first post.
Perhaps, it is a specific issue with the MacOS version. I have not yet been able to test this in a Windows environment.