Interpolation

Hello!

I would need some help with some kind of extrapolation problem.
I wish to change my table columns' by replacing the missing values as following:

Table:

Col1
1
0
?
?
?
0
?
?
1
1
?
?
1
?
0

I would like to replace the missing values by the values of the preceeding row, which gives in this case:

Col1
1
0
0
0
0
0
0
1
1
1
1
1
1
0

and if the first row is "?" then it should take the value of the closest row.
example:
Col2
?
?
0
1

would become:
Col2
0
0
0
1

Then, I wish to extend this process to all the columns.

 

Of course I tried the java snippet row and simliar others, but I always get some error, because I'm not sure how to use them. I could post my code later on if you want, but I prefer not to influence at first.

Thank you for helping!

Hi Heej,

nice problem, I actually had a similar one a while ago, and we have a node for this in the backhand, which will be in one of the next releases. Sorry but I cannot promise it for the next one.

 

First problem : replace with the last real value : Java Snippet

Global Variable Declaration :

double d = 0;
boolean isInit = false;

Method Body :

Double val = $newvalue$;
if(!isInit){
if(!(val == null)){
d = val;
isInit = true;
}
return val;
}
if(!(val == null)){
d = val;
}
return d;

 

The second problem is a bit more complicated, the main problem is, that all our snippets work row by row. So they are not able to look into the future.

So, what you can do is the following:

1. apply a JSnippet with the above code to remove the first problem

2. Add an artificial column containing a counter (e.g. JSnippet with return $$ROWINDEX$$;)

3. sort the data table in opposite order on this column

4. Apply a JSnippet with the above code to the column (the column now only has missings at the very end, which was the beginning before the sorting)

5. Sort the table again on the counter column (this time to recreate the ordering)

6. Filter the counter column

7. Enjoy the result ;)

 

Iris

I managed to do something similar to this, and it is not straight forward. I used looping, in particular the Delegating Loop Start and End nodes. You can get it done this way. You need to pass that column value to the bottom port of the Delegating Loop End node.

 

Another way is to take those columns, strip out the first row using row filter, reset the RowID and Join back together with the original columnds and then use the Column Merger node on the two columns, so that if there is a missing value, it gets replaced with the value from the second column. You can then put this into a loop to capture those were there are multiple missing values in a row.

 

Simon.

Thanks a lot, to the two of you!^^

To Iris:

The first method worked exactly as I wanted! However, the second method can be applied only if there is no missing values at the end of the column, which is rarely the case in my tables.

For example, I could have:

Col1
1
0
?
?
?
0
?
?
1
1
?
?
1
?
0

?

?

?

...if you see what I mean.

So I think I need some method that could catch the first non-missing value of the table, and apply it to the whole first rows, which originally had missing values. 

 

To Richards:

I don't have the Delegating Loop nodes.

So I tried the second method. But I didn't quite manage to do that.. even though that is one of my initial ideas. because I don't get how you reset the RowId..

(I tried with a RowId node connected to the original table + a Joiner node joining the original id column, that starts at "Row0", and the filtered column, that starts at "Row1") maybe I am mistaking in the Joiner's settings..

Then, when you join the original columns with the filtered ones, before merging the columns, which are the settings of the joiner node?

 

NB:

I want to expand this method to all the columns of the table. I tried a loop..in vain.

If there is any possibility to do so, please tell me.

 

Thank you so much again.

Heej.

Hi Heej,

 

hm, I thought you would always first apply the "normal sorted JS". Because then the missings at the end would be replaced by the first step. And afterwards there can only be missings at the start.

 

I attach you a workflow which contains a loop solving your NB :-)

 

And there you also see how i combined the JS, Sorter etc.

 

Iris

Hi,

You need to use the RowID node (Data Manipulation/Row/Other) on both the original column, and the column-1 row. Then after this point use the Joiner node, selecting to Join by RowID, and choosing the method "Full Outer Join".

If you havent got the Delegating Loop nodes, you need to add these by going to the Install New Software option in the Help menu and making sure the KNIME update site is present in the list.

 

Simon.

waouh..

thank you so much Iris! I am safe now! :D

Heej

And thank you to you Richards as well!^^