How to replace integer 0 (zero) with null?

Hi,

I just started to play around with Knime and so far it looks amazing. I have one problem with my dataset that I need to handle. I have some 200 Integer columns for which I want to calculate the average. The problem is that in the data I receive ‘null’ is represented by 0. (All valid data are between 1-6.) This means that the average will be incorrect.

Previously, I manipulated this data in Excel and the first step was always to replace all ‘0’ with ‘’ (empty string). I hope I can skip any manipulations with Excel by using Knime instead but it fails on this simple issue. Is there some workaround? I suppose I could convert these columns to String and then use the String Replacer and then convert back to Integer again but I think there should be a simpler way?

Thanks!

/Krister.

An easy solution would be to filter out those rows which contain invalid data. This have to be done for each column, so I would create a meta node with a filter and the statistics calculator node, and iterate over the columns. Others might have better solution. :slight_smile:

Thanks but maybe I was not entirely clear. The value of 0 (zero) is invalid, this is actually ‘null’. So valid data are 1-6 or null. I need to replace all 0 with null to make the calculations correctly. If I calculate the average and have 0,1,3 => 1.3333 but this should actually be null,1,3 => 2

Well I guess if there are only the values: 1;3 (0 is filtered out), the average will be 2. :wink: So in my opinion the Row Filter would do the trick for each column. Actually it seems I was not clear. So. You have a lot of columns. Those columns have content which is not suitable for your statistics. Those values should be filtered out. The problem might be that the node creating the statistics is not generating you the results? Are we talking about the Statistics View? If it is that node, than your solution is much easier, and I know no better. I guess in next versions there will be a node giving, not just showing the statistics. If you are using the GroupBy, or the Conditional Box Plot you get the proper results, but might have to use the looping nodes. Hope I got closer. :slight_smile:

Thanks, now I see what you mean. So far I haven’t played with the statistics node(s). I have a file reader and then I added two GroupBy nodes, one to get the mean values and one to get the standard deviation. So, I wanted to remove/replace the 0’s before the GroupBy nodes.

The drawback if I get it correctly, is that I need to add one Row Filter for each column (abt. 200). Maybe I can work something out here with the Loop node.

I should probably also explore the Java snippet…

Later I also need to add some other validations, renaming of headers, upper casing etc. but I think I now have a pretty good idea about most of this.

Hey Krister,

I’m not sure how the rest of your file is setup but in the file reader node for example under advanced settings you can specify a missing value string.

Perhaps this would simplify things and avoid the need for writing a whole bunch of nodes to effect changes on each individual column or having to loop over anything.

Anyways hopefully this help sand simplifies things. If you have any other questions if you post and a few lines of example for the input data and the desired output I’d be happy to help if I can.

Working with (transforming, etc…) many columns is an area, which could use some development. A great deal of nodes are designed to operated on a single column at a time but having the ability to either apply a single operation or set of operation to a number of columns or the ability to specify an operation individually for columns would save time and simplify (reduce the number of nodes) in workflows. Perhaps there are some java developers out there for whom this task of a new “interface” for some of the data manipulation nodes would be a simple task. The missing value node is one node where some of this is possible.

All the best,

Jay

Thanks Jay,
I got quite optimistic when I first found this but the problem is that this seem to be working only on String columns and all my data are set to be Integer columns. These values are not affected by this.
Hmmm…maybe I can convert the columns to strings, change the values and convert back again (I assume it has to be numeric columns in order to calculate averages, stdev etc.) Thanks for planting this idea anyway.

Hi krister, Currently KNIME does not have node for looping columns and does not allow setting a missing value pattern other than for strings inside the File Reader node. We are planning to have these looping column nodes in the future. However, to overcome the problem, I found a way to create double-type columns containing null=missing values for “0” values. For this workaround, you need the following nodes lined up: Column Combiner, Java Snippet, Column Splitter, String To Number, Column Filter, see Demo Flow. Best, Thomas

Thanks a lot for the example!
This looks like a working solution. It also provided some good example how to use these other nodes so it’s a great thing to learn from.
Best Regards,
Krister.

Well, in my humble opinion it is a bit easy to do it wrong, at least in this form. If you have data like 10, 104, … in you columns you will get 1?, 1?4, … results, which is clearly not what you want. But the general idea is not bad if we are careful. :slight_smile:

You can drag a control from: Misc / Java Snippet / Java Snippet

and write a code like this:

if ($colname$ == 0)
  return null;
else
  return $colname$;
 

With this, you can insert replace null values where you find zeros.

I hope this is usefull to someone.

1 Like

If you want to avoid the Java Snippet :-) simply use the Maths expression in the Maths node below;

if($column1$<=0, "?", $column1$)

Basically all it does it looks to see if value is 0 or lower, if so it replaces it with an empty cell. If it greater than 0, it retains the original value.

Simon.

5 Likes

I don’t know if there is a node (has been developed by now), but this solution has solved a similar issue in my case. Brilliant idea! Thank you, @richards99!

1 Like

Hi @Odko,

don’t think it had but there is Math Formula (Multi Column) that can handle multiple columns at once.

Br,
Ivan

2 Likes

Hi @ipazin,

Thank you! That is exactly what I used :slight_smile:

Odko

1 Like