Bug: Complex file reader prepends "invisible space" to first colum name of utf-8 input file

Experimenter · September 2, 2022, 10:55am

Topic overlaps with closed BUG: File Reader mishandles UTF-8 BOM - #2 by bjoern.lohrmann

I am dealing with this for several years already. So when trying to read utf-8 CSV file with Complex File Reader, it looks all normal, but actually first column name has invisible space prepended. This leads to hard-to-debug consequences, like [visually] duplicate column names in same table, suddenly failing aggregations when switching data input, etc.

Any hopes to get this handled?

Result of concatenate of same file (3 columns, ccy, rate, dte), read with Complex reader and CSV reader:

Experimenter · September 2, 2022, 11:48am

UPDATE: this is an autonomous fix:

Java code:
return (int) $${SColumn 0}$$.charAt(0) == 65279 ? $${SColumn 0}$$.substring(1) : $${SColumn 0}$$;

Colum rename is controlled by variables.

denisfi · September 2, 2022, 9:12pm

It occours because of encode tuype from file and from the text content… UTF8 need to be the same for both cases… If the file is other type or the content (ISO, Windows/mac…) this caracter wi’ll be present… try to convert you file and contento for the same encode type first.

For CSV file, you can choose the encoder as default:

Experimenter · September 5, 2022, 9:31am

Of course, the file encoding is similar to what is configured in reader node. The bug appears when configuration is correct.

system · December 4, 2022, 9:32am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.