Conditional Concatenate/ If Aggregation

How do you join a cell separated by a comma (,) if a certain condition is met. I.e if the "Record ID Cloned = Record ID (from any row in the data) => (then give me) “Record ID, Record ID Chain (Output)”. The Record ID Chain (Output) is the output im trying to create but we can have two columns if needed.

See the table below. Thank you !

Hi @DavidO,

can you copy & paste the data for testing purposes please? The editor should auto-format the table nicely.

Thanks
Mike

Hi @mwiegand,
Here is the data, thanks

Record ID Record ID Cloned Cycle Record ID Chain (Output)
Task A1 Q1 Task A1
Task B1 Q1 Task B1
Task C1 Q1 Task C1
Task A2 Task A1 Q2 Task A2, Task A1
Task B2 Task B1 Q2 Task B2, Task B1
Task C2 Task C1 Q2 Task C2, Task C1
Task A3 Task A2 Q3 Task A3, Task A2, Task A1
Task B3 Task B2 Q3 Task B3, Task B2, Task B1
Task D1 Q3 Task D1
Task C3 Task C2 Q3 Task C3, Task C2, Task C1
Task D2 Task D1 Q4 Task D2, Task D1
1 Like

If you join the 2 columns (if necessary create duplicate columns) and then Groupby record id with concatenate as aggregation type and separator a comma?
br

Hi @DavidO,

thanks for sharing the data. Based on your description:

If the "Record ID Cloned = Record ID (from any row in the data) => (then give me) “Record ID, Record ID Chain (Output)”.

The best approach would be to follow “divide & conquer”. You basically split apart based on the condition, which could be many (Dictionary Row Split), followed by whatever task you want to accomplish … in this case a join with comma as a separator.

But given the fact that there are no results, either the data is incorrect or you wanted to accomplish something else. Please let me know.

Best
Mike

1 Like

Hi @mwiegand ,
The desired result is the 4th Column "Record ID Chain (Output). This column isn’t in my data set so this is the column i’m trying to create. Hope this clarifies, thanks David.

@Daniel_Weikert, please disregard the comment above, still getting used to the forum. How would you process this in Knime?

Ah, So you mean when the task letter matches in the first two column then concatenate. I have updated the workflow accordingly. Worth to note that it uses the Regex Extractor Node form the awesome Palladian extension.

Here is the updated workflow:

4 Likes

Hi Mike,
Is there a way to achieve this without the Regex extractor? I checked in and for this project, we are not allowed to use extension nodes - only standard knime nodes are permitted.

Thanks

Hi @DavidO,

There always is :wink: In fact, I initially thought about using the plain old Regex Split Node:

But that isn‘t close as convenient. Let me check this later this evening please.

Best
Mike

2 Likes

Et voila

I have gone a bit borderline showcasing three different approaches by using:

  1. Rule Engines
  2. Regex Split
  3. String Manipulation (Multi Column)

All have strengths and weaknesses best seen with the bug once raised and still not fixed:

I will raise this once more.

Happy “kniming”
Mike

3 Likes

@mwiegand Thanks! It looks like the Regex extractor perfectly though unfortunately I cant use the additional nodes. The other steps don’t seem to be producing the desired output, can you please walk through your logics, maybe i can recreate

The input values were the same, so should be the output. The results are all identical. can you elaborate on the issue you face please?

PS: Workflow was updated but no change was made to the nodes. I just wrapped them in a component for convenience reasons. Though, may I ask why you are not allowed or cannot install extensions?

Cheers
Mike

2 Likes

Hi Mike @mwiegand ,
I can’t use additional nodes as I need a solution that I can be utilized widely. I have also marked as solution as your method does solve the question as asked. I should have clarified that I was looking for a solution that can be utilized with multiple sets of data applying the same logic. I will create a new post with the clarifications. Thanks for engaging.

Thanks!
David

1 Like