ROW REFERENCE IN TABLE (CELL LIKE OPERATIONS)

zebov · September 15, 2021, 7:22pm

Hello KNIME SWARM, I have a question:
KNIME is perfect for column like operations - for example imagine table like that:
COL1, COL2, COL3
VAL1,VAL2,VAL3
VAL4,VAL5,VAL6
VAL7,VAL8,VAL9
Imagine you are creating COL4: with values of row 0 from column 1,2,3 being multiplied - its quite easy to do - you will have COL4 LIKE:
VAL1VAL2VAL3
VAL4VAL5VAL6
ETC
But now - is it possible to do VAL1VAL5, VAL4VAL8 etc - you get it? Something like CELLS in Excel… I was thinking pivotting / row numbers but is there any other easier way?

bruno29a · September 15, 2021, 8:58pm

Hi @zebov , yes manipulating data using multiple columns is very much easy, but not so much when it comes to using multiple rows. That is because Knime processes a table row by row - that is why also you do not need to loop through each row to apply a process to a whole table.

So, because of that, it is much more challenging when having to check other rows for manipulation. However, that does not mean that it is not possible. There are some nodes that can be used for doing this type of operations. You can check the Lag Column node for example, where you can make copies of values from other rows. You can also do this type of operations in Java or in Python via the Java Snippet and the Python Snippet where you can have access to the whole table, and access any rows of the table.

A more complex alternative, in the event that the relative rows are consistent (for example, you need to know what’s in the next row all the time), you can always create dummy columns referencing each row to other rows (a bit like a relation table), and then join based on that reference (and essentially you’re implementing what the Lag Column does)

zebov · September 15, 2021, 9:48pm

Thanks for answer. I have been using java / python snippet - but what do you mean by referencing rows in a table? From what I see we have row id, row index and nothing there can reference for example row 50 from column 2…

bruno29a · September 15, 2021, 10:55pm

Hi @zebov , this is similar to table structures where the parent and children are stored in the same table, usually the children referencing the parent.

You can create this structure. I’ll use your data as an example.

Here’s my input table:

And here’s my manipulation results where I capture the value of COL2 of the next row:

I can now do manipulation through multiple columns. As you can see, you are now able to work with VAL1 and VAL5, VAL4 and VAL8 as you wanted in your example.

Here’s what the workflow looks like:

And here’s the workflow: Referencing to other rows.knwf (13.0 KB)

zebov · September 15, 2021, 11:16pm

Hey Bruno, this is nice trick, something similar I was thinking about - but is there any other way? Like referencing a cell (row) in the snippet (java/python)? But kudos for trick, creating another table, with “pivotted” values is indeed nice idea! Imagine if KNIME would get this cell operational abilities!

bruno29a · September 16, 2021, 12:45am

Hi @zebov , as I mentioned, this would be the most complex way. There are simpler ways as I mentioned, such as using the Lag Column node, or through programming languages.

A little correction though, it looks like you can’t access different rows in Java Snippet. I did it with Python before, I thought Java would be able to also.

In Python, you have access to the whole input table (that’s literally what the first line of code does in the Snippet). You can access each row using the index ([0], [1], etc).
So in this case, you can use a loop iterating through each row of the table, and adding COL2[i+1], where i is the current iteration and current row, to get the next row’s COL2

Same thing with R Snippet (this also literally comes with a first line that makes a copy of the whole input table).

zebov · September 16, 2021, 4:27pm

Thanks for all your replies!

bruno29a · September 16, 2021, 6:35pm

Hi @zebov , I just set up my Conda and Python environment and installed the Python extension, so I’m now able to run the Python Script.

Here’s a way to accomplish this in Python:

Results:

Going through the code:
This makes a copy of the input table into the variable table

# Copy input to output
table = input_table_1.copy()

Let’s get how many records there are, so we can loop through them eventually:

rowCount = len(input_table_1['COL2'])

We’ve captured the number of row in the rowCount variable. Just a note here, when I’m referencing to “variable” here, they’re variables within the Python script, as used in a programming language. They’re not flow variables, and are not accessible by Knime. They reside only within the Python script.

I then create a new array element called COL2_reference in my variable table which will eventually be transformed into a column by Knime (more on this later below):

table['COL2_reference'] = [None for x in range(rowCount)]

Then I loop through reach row, where the current row’s index is i, and I retrieve the value of the next row, whose index is i+1 (current index + 1), and stores the value in the new array element COL2_reference:

for i in range(rowCount-1):
  table['COL2_reference'][i] = input_table_1['COL2'][i+1]

I then send the table to the output port

output_table_1 = table

FYI, I’m not a Python coder, but this is simple enough to write without profound Python knowledge.

I added this to my workflow:

Here’s the workflow: Referencing to other rows.knwf (15.4 KB)

zebov · September 16, 2021, 7:07pm

One more time- thanks a lot for your help! You are awesome!

system · March 18, 2022, 7:07am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.