Appending Unique Identifiers in multiple branches

Hey,
I have multiple data sources and I want to give rows ids before concatenating them, (String manipulation Node simulating appending of an ID)
From the same data sources rows can have the same ID, but there shouldnt be duplicate IDs from different datasources after the concatenate. How can I make sure, the second SM Node (or whatever Node is suitable for this) doesnt generate IDs, the first node has already generated, making this as parallel executable as possible?

Thanks,

Tim

Hi @tbtt

My approach would be to loop over all the data source and use a Constant Value Column node for the ID.
The ID will be created as flow variable (e.g. iteration). Something like this example from the KNIME Hub .
gr. Hans

1 Like

Hi @tbtt,

The Concatenate node appends a suffix to duplicate row IDs and then you can use the RowID node to refresh all the IDs. But if you still need to generate some unique values in each branch before the Concatenate node, you can use the Counter Generation node and use flow variables in the second branch to start numbers from the row number in the top branch.

Here is an example workflow demonstrating both approaches:

22762-1-1.knwf (98.8 KB)

:blush:

2 Likes

Hey guys, thanks for your answers!

Both of your answers miss a small part of what I need, I want some rows to have the same id, I just need to ensure, that those duplicates are from the same branch and there are no duplicates from different branches (and I would like to ensure that by design, not just check for it).

As the logic behind the ID assigning is unique for the branches looping as suggested by Hans isnā€™t an option.

Armins approach isnā€™t fitting neither, because using the RowID isnā€™t allowing duplicates at all.

I thought about having a global list (or table with one column) and fill it with enough UniqueIDs, whenever I need one or multiple columns to have an ID, I would take one ID from the list, assign it to the rows that I want to, e.g. with Constant Value node and delete the id from the list.
As long as the same ID wont get pulled out of the list twice before deleting it, this should do exactly what I want in a good way. I heard there are array flow variables, can they be used for that? How can they be created?

2 Likes

It would be great if you provide us with a sample data set and your desired output.
But what I can sense from your explanation is that you need the Reference Row Filter node.
So you remove rows with the same ā€œartificialā€ ID and then concatenate branches. Does it make sense now?

:blush:

Hi @tbtt

As looping is not an option. How about this workflow. appending_unique_id.knwf (99.0 KB) . The Rank node creates a row_id as the concatenate rows generate a file_id.
Screenshot from 2020-04-21 20-27-55

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.