Beginners question about loops

I am sure i am missing something trivial.

Considder the following dataset:

ID Key Value
0 A jkahsdiqweqwje
1 B jkljhsd ipasdo
2 C 8001wdmkmasmkd
3 C m08qwdmumjasda

You can see that the key value is not unique. I want to add a column with an extra sequence for those keys that have more then one occurence. In each instance i want this sequence to start at 0 again.

I did manage to merge key-counting data into the dataset, so i can see those rows that have one or more instances by the value of that count. I think i should split the dataset after this counting merge into those that have 1 and those that have more then one occurence. Probably use the IF nodes there?

But then, but how to proceed from here?

I am pretty sure that the answer will be pretty trivial...

To quickly see which keys occur more than once, simply use the GroupBy node and choose the Key column to Group By, and in the aggregate section, choose any column such as ID and select aggregate method as "Count". This will tell you how many times each Key is represented. Is this the type of thing you are after as I am not totally clear from your post ?

If you want numbers to start afresh for each change of key, and counting up, then you could do this.

1. Use GroupBy node, and select Key column for GroupBy. This will give all the possible Keys in the outcome. Now connect up a TableRow to Loop Start node to this.

2. Connect a RowFilter node up to the main data (prior to the groupby node) and turn on its Variable nodes (right click and show variable ports). Now connect the Red Variable line between the Row Filter and TableRowtoLoopStart node.

3. Configure the RowFilter node. Choose to filter on the variable, click the little square box next to the filter box and choose the variable "Key" if that is the column name for the key column. Alternatively, click Flow Variables tab, and from the dropdown next to filterstring choose Key.

4. Connect up a Math Formula node to RowFilter node. In there choose to Append  new column and give it a name for this new count. In the expression box choose RowIndex from the righthand window.

5. Connect up a Loop End node.

Is this what you want to achieve ?


You could also solve it with the Java Snippet node. Make sure the data is sorted according to the Key column (Sorter node). Connect a Java Snippet node to the output of the sorter and in the global declaration field define a field "lastKey" (String, last seen Key) and "index" (int, index in the group). In the method body field, define a script such as:

if (!$Key$.equals(lastKey)) {
  index = 0;
} else {
  index = index + 1;
lastKey = $Key$;
return index;

The return type of the script is an integer.

I'm attaching a small workflow that demonstrates it.

Hope you find it useful.


Both methods work, the first was more or less the way i was trying to do it first (but must have taken a wrong turn somewhere), the java sniplet version seems to be a lot more efficient though.

Conclusion: do not underestimate he importance of variables!

Great help, i thank you very much.