Hi Folks:
I'm developing a node and I've encountered a small issue with reading in UTF-8 Encoded Files with the File Reader node. In my configure() routine I check that the input table from the upstream File Reader has a column called "Location" and a column called "Population". My locations are all Chinese city names as such:
Location | Population |
---|---|
上海 | 2301.91 |
北京 | 1961.24 |
吉林 | 441.47 |
That's the populations of Shanghai, Beijing and Jilin in the ten-thousands for the curious.
While the second, third, forth, etc columns always seem to match, the first column is never matched. But when I use a ANSI file like below everything works fine.
Location | Population |
---|---|
Sydney | 300.0 |
Melbourne | 250.0 |
Canberra | 20.0 |
Digging deep into the code, I found the KNIME routine inSpecs[0].getColumnSpec("Location") is failing for column 0 and returning a null.
The reason appears to be that the upstream File Reader prefixes a non-visible character to the beginning of the name field of column 0 when reading in UTF-8 files so that the name of column 0 becomes " Location" instead of "Location". I've attached a picture of the debugger for reference.
My workaround is to ensure that column 0 always contains something harmless like the RowID.
Just thought you should know.