Nan to Missing Value

Hi,

How can I change the Nan values of my data frame to missing values so I can use the missing value node?
From the old forum https://www.knime.com/forum/knime-general/how-to-deal-with-nan-values,
it says I should use a Java Snippet Row Filter to remove rows with NaN but I don’t want to remove them. I want to process it like how I do it with the Missing Value node.

Thanks,
Claudette

Hello Claudette,

I assume that your column is a Numeric (Double) one.
You can use the Java Snippet Node to achieve that.
Assuming your column is called ‘MyDoubleCol’ you can use this simple line of code to change NaN to missing values:

out_double = c_MyDoubleCol.equals(Double.NaN) ? null : c_MyDoubleCol;

Explanation:
out_double is your output column (in this case a new one, but you can also replace MyDoubleCol).
c_MyDoubleCol is your value of MydoubleCol that shall be checked for NaN
.equals compares this number to a specific one (Double.NaN) is the number we want to compare our input to (in this case a special notation for NaN)
? offers two options, if the previous test is true, i.e. your number is an NaN then null is returned, which will be interpreted as a missing value, otherwise your input number will be returned, i.e. only NaN will be replaced and nothing else will be affected.

I also appended a screenshot. Hope this helps you.

An easier to understand, ‘non-programming’ way would be to convert your column into a String using the Number To String Node and then use the Rule Engine an apply the following rule:

$MyDoubleCol$ = “NaN” => “”
TRUE => $MyDoubleCol$

Again, MyDoubleCol is the String-column containing my numbers as String. In this case, NaN is replaced by an empty String (first line) whereas my other numbers stay the same (second line). Then you can use the String To Number Node to convert it back to your numerical column.

6 Likes

Hi Moritz,

Thank you. Yes, the rule engine is much simpler.
But I’ll take note of both your solution.

Regards,
Claudette

1 Like

Hi all,

not sure if it would be useful also to somebody else, byt in your case (I needed just to replace pandas 'NaN’s to KNIME missing values ‘?’) simple “Math Formula (Multicolumn)” node did the trick for me. Just set it to process all (wild card = *) or only selected columns, set expression to $$CURRENT_COLUMN$$ and set the node to Replace Selected Columns by the resulting values.

This should replace ‘NaN’ by ‘?’. The only change I see is change of column datatypes - I changes to D for some reason…

Take care.
David

7 Likes

Can you precise how you did that ? Maybe a screenshot would be clearer…
Thanks

Instead of the rule engine, one can also use the cell replacer node :
The top input table should be your table with the target column containing the “NaN” as string.
The bottom input is a one line table (generated with the table creator) see screenshot.

2 Likes

Hi,

this is how to select all columns: image

And this is how to make the node to do to what I have wrote:

Hope that helps.

David

8 Likes

Yes thanks, so just by setting output = input the node automatically cast NaN to missing value, cool !
I just tested with the “simple” math formula (not multi column) and it works too

It is great to have this idea by using Math Formula. However, I’d like to point out, one limit of this method is that it can only accept numeric values in Math Formula. So when my column with missing or NaN is in String after reading, it doesn’t work.

Regards
Kefang

Hi @weiclav,

your solution was amazing. it saved me a whole life :joy:
many thanks

1 Like

The String to Number node replaces the “NA” values with “?”.

1 Like

I had the same issue, with multiple columns containing ‘NaN’ after Column Expressions…
Resolution: Math Formula (Multi Column)
‘Include’ all columns where you wish to replace ‘Nan’ with missing value ‘?’
Reference the expression: $$CURRENT_COLUMN$$
Replace Selected Columns

This replaces “NaN” across all columns while preserving other values.
Regards,
Kent

3 Likes

Are the NaN from python? If so thanks a lot. That’s helpful
br

The “NaN” values were a result of using the “Column Expressions” node and not from Python. Either way, the original value was “NaN” and became “?”. It should work the same assuming your column data type is numerical e.g. double.

NaN is not a missing value. It means that the value could not be computed at all.
E.g. division by zero is a typical source for a NaN.

I would try to find out why there are NaNs instead of replacing them afterwards.

1 Like

The way the Math Formula node treats NaNs is a bit odd.

if(isNaN(1/0), 1, 0)

yields 0 (i.e. not a NaN); I guess because it is considered infinite:

if(isInfinite(1/0), 1, 0)

yields 1. However directly returning 1/0 from the Math Formula node yields a missing value, not an Infinite value.

However…

if(isNaN(0/0), 1, 0)

yields 1, so in this case a NaN is produced. But… only internally! Directly returning 0/0 from the Math Formula node results in a column with missing values.

Are missing values the same as NaNs as far as the Math formula node is concerned? No:

if(isNaN($ColumnWithMissings$), 1, 0)

yields a missing value. (By the way, there is no “isMissing” function).

Also interesting:

if(isNaN(sqrt(-1)), 1, 0)

yields a missing value (i.e. the “if” is not even executed). Strange, because the Java sqrt function yields a NaN for negative arguments.

Also raising 0 to the power of 0 does not yield a NaN:

if(isNaN(0^0), 1, 0)

yields 0 (0^0 yields 1). This seems to be the Java standard.

The Math Formula node seems be a black hole for NaNs (and also Infinites), and it cannot return NaN and Infinites itself. I suggest a “Allow returning NaNs and Infinites” check box feature.

3 Likes

You could try and take a look at this example and the additional links: