How to deal with NaN values?

gcincilla · November 12, 2014, 6:08pm

Hi guys,

The missing value node in the standard knime repository it is very useful, because it allows to filter out rows containing missing values or alternatively substitute them with some useful value from the data (e.g. mean, max, min, etc.). Anyway this node account only for the missing values and not also for the “NaN” values (or "infinite" ones) which are generated by some nodes. Is there a specific way to treat those?

If not, could it be interesting to (optionally?) include those to be managed in future versions of Missing Value node?

Thanks for any feedback,

Gio

thor · November 12, 2014, 8:06pm

I agree that it's currently quite hard to deal with NaN in KNIME. You can use the Java Snippet Row Filter to remove rows with NaN (e.g. "return !Double.isNaN(xxx)"), but this is about it.

However, the missing value node is the wrong place, because NaN is something very different from a missing value. If you, for example, divide 0/0 you get NaN. But this is certainly not a missing value.

gcincilla · November 13, 2014, 10:42am

OK Thor, you're completely right, the Missing Value node is not a good place to incorporate a functionality for the management of NaN. Maybe we can think to an ad hoc node to deal with all non-finite quantities (NaN, infinites, etc.)

Thanks for your solution with the Java Snippet Row Filter node. It works well. Finally I was doing the same using a Row Filter node and a regex.

As I said these solutions work well when you apply them to a certain predefined column. As I want to filter out possible NaN values in all the columns of my table, I tried to mount your solution inside a Column List Loop where at each step I filter out NaNs rows for the current column. Anyway there is a problem with this approach because the Loop End Column Append node use a full outer join on the RowID column, while in this case an inner join is needed. For this reason what this loop de facto do is to transform NaN values into missing values. So finally you can solve the problem applying Missing Value node after the loop. Anyway I was wondering if in KNIME it exist a more elegant solution for this (like a Loop End Column Append working with a inner joint).

Thanks for your help!

aborg · November 13, 2014, 11:10am

I think the recursive loops can perform this kind of filtering. See the nice intro from Iris for details.

Cheers, gabor

gcincilla · November 17, 2014, 11:03am

Hello Gabor,

Thank you so much for your suggestion. I'll take a look at the video and try to apply the recursive loops.

Gio

ajason08 · September 25, 2020, 6:16am

Similar approach, but using java snippet (simple) Node
This will convert the NaN into a missing value, so you can better deal with it.

double doublecol =$mycolumn$;
if (Double.isNaN(doublecol)){return null;}
else{return doublecol;}

sammyhallgren · February 14, 2023, 9:00am

Best way is to mange the source of the NaN. It’s the result of an illegal operation like Zero division.

If the use case does not allow to fix the root cause there are two simple ways to manage NaN.

Feed the table into a Java Snippet (simple) node and execute the node. Will replace NaN with missing values that can be managed by a missing value node.
You can do the same with a Python node but then you need to load the data into a Pandas data frame and then back to Knime table like so (Will replace NaN with missing values that can be managed by a missing value node. ):

import knime.scripting.io as knio
import pandas as pd

df = knio.input_tables[0].to_pandas()

knio.output_tables[0] = knio.Table.from_pandas(df)

mlauber71 · February 14, 2023, 9:56am

@sammyhallgren you can find several ways to deal with NaN in this workflow which refers in some ways also back to this very old thread