string to missing value

arne_g99 · June 18, 2021, 8:17pm

Good evening from here !

I have a csv-file, where missing values are marked with “m”. Knime sees the affected columns (basically all) as a string due to that. I need to do a classification and to be able to do so, my idea was to change the letter into a missing value, so after that I can convert the strings into numbers.
This needs to be done for all except one out of around 150 columns.

It would be amazing if anyone can provide a solution for that problem ! I tried around a bit with string manipulaton and java snippets as some posts suggested, but I don’t really know where to start as I’m new to knime.

Kind regards

takbb · June 18, 2021, 9:46pm

Hi @arne_g99 and welcome to the forum.

With 150 columns, this could be achieved using Column Expressions, if put into a Column List loop. It’s not overly quick but it will do it…

The trick is that in the Column Expressions node, you need to reference the “current column” in the loop, which means the code looks something like this:

if (column(variable("currentColumnName"))=="m")
{
    null;
}
else
{
  column(variable("currentColumnName"))
}

You need to tick the “Replace Column”

and the “currentColumnName” flow variable is used to specify the output column name:

Sample input data:

Sample output:

Example workflow attached

Set specific string value to missing on multiple columns.knwf (15.9 KB)

arne_g99 · June 18, 2021, 9:57pm

Thank you very much !

just after I registered on here I found a short solution that seems to work as well. I used a String Manipulation block (multi.-column) and replaced the placeholder with empty. After that, a String To Number - Block recognized the empty spaces as missing values.

takbb · June 18, 2021, 10:50pm

hi @arne_g99 that’s a useful trick, and better than my solution!

Edit: in an attempt to save face though…

here is a solution using just the String Manipulation (multi-column) node:

$$CURRENTCOLUMN$$.equals("m") ? null:$$CURRENTCOLUMN$$

(Incidentally, thank you for marking my first answer as “solution” but actually, it wasn’t a great solution - just the only one I could think of quickly, so feel free to mark your own as the solution instead, as I don’t think I deserve it this time with yours being much simpler, faster and better! )

ipazin · June 21, 2021, 9:41am

Hello there,

if you use File Reader node might be it will work out of the box and if not there is Missing Value Pattern under Advanced....

Although think this should be addressed in reader nodes one can use regexReplace() function with same outcome in such/similar cases:

toNull( regexReplace( $$CURRENTCOLUMN$$, "m" , "") )

Welcome to KNIME Community @arne_g99!

Br,
Ivan

system · June 28, 2021, 9:41am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.