Java Node Loop over Rows

petermeissner · May 26, 2020, 12:46pm

I have a class of tricky data management problems where the current value of a cell does depend on the value of previous cells in the same column as well as on previous values of other columns.

I could do this with first using a “Lag Column” node and thereafter a combination of “Rule Engine”/“Math Formula” or I could pipe the whole thing to R do the transformation there and than pipe it back to Knime.

While the Knime solution might work it is not very flexible. The R solution on the other hand is very flexible and expressive but super slow because data gets copied to R and than back again to Knime and developing R for Knime or R in Knime is not too much fun.

So, my intuition is that Java Snippets would be the right place to get this implemented. But while its very easy to write things that work based on columns (It seems to work like R’s vectorization, i.e. col A + col B will give the a rowise sum for each cell per row.) I do no find any way to loop over rows and something like this:

for (int i = 0; i < 5; i++) {
  A[i] = B[i] * B[i-1] + A[i-1]
}

Is it true, that I cannot write those kinds expressions within a Java Snippet?
What are alternatives to my two already proposed solutions?

qqilihq · May 26, 2020, 12:55pm

A “random access” doesn’t seem possible with the Java snippets, but:

You could define some state, holders, stacks, however you want to call it in the Java snippet’s “system variables” section. They would be available in subsequent rows. You’d just have to maintain these manually (i.e. add in the previous iteration, get/remove in the next):

Obviously you cannot access subsequent rows this way though.

Does this make sense?

AnotherFraudUser · May 27, 2020, 1:17am

Hi @petermeissner,

not sure if I misunderstood something.

I think you have to handle the case with i=0?
Or what do you expect to happen in the first iteration?

I think you could try the following:
Get the relevant columns as array:

Then use the java snippet node to execute your code from above more or less:

out_A = c_A;

for (int i=0; i<out_A.length; i++)
{
if (i == 0)
{
out_A[i] = c_B[i] * c_B[i] + out_A[i];
}
else
{
out_A[i] = c_B[i] * c_B[i-1] + out_A[i-1];
}
}

Input
grafik

Output ( A_new just for better checking - you can also just replace the previous column)
grafik

grafik
KNIME_project14.knwf (9.6 KB)

Not a perfect solution - and actually not sure how the performance is for this
But if you do not have to many columns then this could work

but like @qqilihq said - i think there is no direkt java snippet function to get a cell value from previous rows e.g. getcell(1,1) or A.getrow(1)

ipazin · May 28, 2020, 1:56pm

Hi there @petermeissner,

I think both of your solutions are good and valid. Personally would go for KNIME solution with intention to make it as flexible as possible.

If you share data and calculation example maybe you’ll get some ideas/suggestions.

Br,
Ivan

beginner · May 29, 2020, 4:20am

That is true and I have for a long time asked for a Java snippet that works like the python script: Full control over the whole table but without the need to serialize back and forth which often takes longer than the actual calculation.

ipazin · June 1, 2020, 12:16pm

Hi there,

if not mistaken @beginner is talking about this one:

Have given it +1 for this topic.

Br,
Ivan

AnotherFraudUser · June 1, 2020, 1:02pm

@ipazin could you put a +1 from me as well?

ipazin · June 1, 2020, 1:21pm

Done @AnotherFraudUser!
(Internal reference: AP-12550)
Ivan

petermeissner · June 2, 2020, 1:37pm

Ha! Cool, that will not go all the way but certainly a long way.
I did not know that something like a ‘state holder’ has to got into the “system variables” section.

Thanks.

PS.:

I think the idea of getting full access to the whole table with all columns, rows, cells could really solve a lot of problems that need somewhat more flexibilty than the normal data management nodes provide: filter, groupBy, join, …

Another approach would be to develope my own node from scratch with the SDK but even my Java-GoTo-Bro says that its no small project to write nodes this way.

petermeissner · June 8, 2020, 11:51am

I have looked into your proposal some more … I thought it would solve my problems but there is some weird things going on (Thanks, nonetheless! ).

If I only keep one element in the linked list I either get a constant value or the current value.

Also, I cannot compare the String-value from String.join(...) with a string from a colum c_id.toString().

I am giving up on this, its hard to grasp what is going on and simple looping over rows is no intended use. Back to serializing back and forth between R and Knime.

KNIME_project.knwf (13.3 KB)

qqilihq · June 8, 2020, 1:28pm

Could you explain what the desired table output would look like?

petermeissner · June 9, 2020, 11:35am

Most simple example:

id  value 
 1      a
 1      c
 1      c
 2      a
 2      c
 2      c
 2      a
...

My use case at the moment is to sort out redundand data from a data set - I know I can solve this particular example by using a rule engine BUT my hope was to use Java nodes to solve this and a bunch of other problems in a more general and generic way so I can encapsulate and re-use this for different tables with different column names, and type and numebr of columns …

Rules:

If i == 1 | id[i] != id[-1] then true
If i == 1 | value[i] != value[-1] then true

id  value    keep
 1      a    true
 1      c    true 
 1      c    false
 2      a    true
 2      c    true
 2      c    false
 2      a    true
...

beginner · June 9, 2020, 12:08pm

Yeah the big issues with writing own nodes out of experience are:

Setting up the dev environment
which is eclipse which is masochism in it’s purest form (subjective)
Reading the docs how to create nodes
implementing the nodes
then figuring out how to deploy and update them (via update site?)
Maintenance: keeping them updates also with knime versions

maintenance is cumbersome because you probably won’t have to do it that often, so you will always more or less learn again how to deploy the nodes. And since it’s eclipse…enough said.

system · December 9, 2020, 12:15am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.