CSV Reader special characteristcs

Hello,

I would need help with the CSV Reader Knod. I’m importing a CSV file which has entries in german language and contains letters like Ä,Ö,ö,ä,ß….

With the help of AI I have set up a Python Script knode which should prevent these letters from being brought into the table in an ever changing way. Unfortunately it does not work all the time.

The standard KNIME CSV Reader shows the same behavior.

Python Script - Importing CSV.docx (17.7 KB)

Does anybody have an idea how I could solve this either with a standard KNIME solution or how to adopt the script so that it works better?

Thank you for the support in advance!

Gerald

Hi @GHoertner

just to get it right: what’s the matter with the CSV Reader node? I don’t have any issues at all with Umlauts…

1 Like

Hello @GHoertner .
Umlauts are standard characters within UTF-8. You can test to remove umlauts with ‘String Manipulation’ node:
replaceUmlauts(str, omitE)

BR

1 Like

Hi.
The new CSV READER in 5.11 has a different configuration place, but the UTF-8 File encoding is still there.
Br.

3 Likes

Hello,

I know that I can choose between several different kinds of encoding of a csv file. If I know the kind of encoding it works propely, but as csv files tend to change the encoding for what ever reason if saved after being used outside of KNIME I need a stable way to read these files. I can tell the users who are working with the same files as I do in KNIME to not ise the files and to not save them, so therfore I need a robust way to read the file an deal with he special characteristics.

thank you

Gerald

Hello,

thank you for the idea - I will try if this solves my problem.

Gerald

Hello,

I know that but this doesn’t work all the time - please see my more detailed reply to Awiener.

thank you

Gerald

@GHoertner you can use the Python package chardet to determine the CSV encoding and build that into a KNIME workflow to automatically detect the encoding and also the separators.

You could also think about validating the file against a sample file and ensure the types and columns are correct.