Feature request: single row loop start

Aswin · May 21, 2020, 7:18am

Dear Knimers,

When i want to iterate over individual rows of a table, I mostly use the “Chunk Loop Start” with the “Rows per Chunk” set to 1. However, there are plenty of situations where it would be nice to have the current Row ID available in a flow variable within the loop. This would be similar to the column name being available as a flow variable when using a “Column List Loop Start”. As a workaround I sometimes use a “Group Loop Start” on the RowIDs but in that case I first have to use the “RowID” node to copy the row IDs into a normal column, and afterwards remove this column again; it is also a extra unnecessary work for the system trying to look for groups that are not there.

So my feature request is as follows: a “Single Row Loop Start” where the row ID of the current row is available as a flow variable within the loop.

Best,
Aswin

ipazin · May 21, 2020, 2:22pm

Hi @Aswin,

Can you give some example?

And what about using Table Row to Variable after loop start. It is only one node…

Br,
Ivan

Aswin · May 21, 2020, 8:54pm

Dear @ipazin,

The Table Row to Variable node is exactly what I am trying to avoid, is kind of ugly… Also, I noticed that there are two kind of Chunk Loops in my workflows: one type is where I can increase the chunk size and the loop still works, and the other type is where the chunk can only be 1, or else the loop will fail. I think it would be elegant to have different types of Loop Start nodes for these different constructs.

Example. Suppose I have a table with, for example, the body weights of the ostriches that get born at Berlin Zoo, one column per weighing session for each week after hatching (columns “week 1”, “week 2”, etc). I want to fit these weights with the polynomial regression learner. To use the polynomial regression learner, I need to have all the weights for one hatchling in a column, not in a row… one option would be to transpose the whole table and loop through the different ostriches with a Column List Loop start. Problem: transposing is slow for large tables. A Column List loop is also VERY slow for large wide tables. If I ever want to use that same workflow with a chicken farm where the workflow has to deal with millions of chickens, the workflow will take hours or days to complete.

Much more efficient to loop through the individual rows of the table, transpose that one row, and use that with the polynomial regression learner. It could look like this:

With a Single Row Loop Start the lowest two nodes would not be necessary. The Column List Loop Start already works like this, why not have a rowwise loop that works in the same way?

Best,
Aswin

ipazin · May 22, 2020, 2:19pm

Hi @Aswin,

tnx for vivid example! I understand better now.

Will check it a bit more and come back to you.

Br,
Ivan

ipazin · June 1, 2020, 9:53am

Hi there @Aswin,

Checked it. Unfortunately seems a bit special use case so not in plans for now. But let’s keep topic open and maybe someone else has same/similar request and then it can be re-evaluated.

Br,
Ivan

Aswin · June 2, 2020, 6:47am

Dear @ipazin,

I just would like to add that the behaviour of this suggested “Single row loop start” would be identical to the “Group loop start” if this could be executed on the RowIDs as groups. Unfortunately, it is not possible to select the Row IDs as groups at the moment. Maybe this feature could be easily added, but in that case, it should not waste time looking for groups.

As a side note, the feature would be identical to the Python Pandas iterrows() method, where both the index and the row are made available to the inside of the loop.

Best
Aswin

ipazin · June 2, 2020, 9:37am

Hi @Aswin,

don’t know much about development so can’t say anything about effort needed for such implementation but tnx for sharing ideas

Br,
Ivan

system · December 1, 2020, 9:47pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.