Hi all, I’m new in KNIME world. I’m trying to practice with it using my real work problems.
I have a table with 2 columns and several rows. An example of my input file:
Column1 Column2
region_001 A
region_001 B
region_004 D
region_002 D
region_004 T
region_001 XX
I need to merge in one row the columns 2 values if the string in column 1 is the same.
the output should be:
Column1 Column2
region_001 A,B,XX
region_004 D,T
region_002 D
I’m trying to use the GrouBy node but I can’t set it. Could someone help me please?
Thanks! It works! And if I have to run the groupby node inside a loop? How I can set that the column to aggregate (column2) is controlled by a variable? I mean that I have a lot of colums and I have to do the same aggregation for all the column always taking as group column the same column 1.
Wait, I’m here to help. Could you please update a small dataset, so I can better understand why you need a loop? If it’s sensitive data, some fake data will do the job.
I have a file contains sevaral columns inside of which there are several descriptors. I did a worflow to mange data of this file. In paritcular the first step is a loop in which I assigned a variable to each column containing numerical values (score). For each of these column I did some operation. The problem is that for each iteration i should identified the maximun and minimum values based on the values of a master column. I’ll try to explain better with an example.
I would stick in this discussion as well. You can often avoid the need for loops with the Group node, and it is much more efficient. Keep in mind that Group calcs can be joined back, pivoted, filtered, etc to achieve your goal.
Update for above post. That just takes 2 Group by nodes. Group by column 1 in both, calculate min in one and max in the other. Join them together by category 1, then combine min and max output columns into a comma list.
yes but the problem is that this operation should be done inside a loop because after that the results should be merged with others data. If I run the grupby node outside the loop I guess that I have to use another variable to select the column of the groupby node outputs to merge
Can you upload dummy data in a workflow to the forum? Do you have hundreds of these min /max aggregation columns and need a dynamic solution that adapts to any column name?
It would be easier to show you some of the myriad of options available in KNIME to tackle a problem like this.
I don’t think I understand your explanation as to why the operation has to be done inside a loop. This is a simple process with a simple solution of using the GroupBy node like the others pointed out.
Another easy loop avoiding idea is to just list the unique values in order by each column using Group By, then extract the 1st and last number and recombine them to get min / max.
@tommasopalomba - You can then combine the min and max columns & pivot the column names back. That should give you a dynamic solution for your target result.