Keeping one row for duplicated values in one column based on possible inverted repetition of cells in other two columns

Ok, so I’m getting to the following table in a KNIME workflow. I transformed it to a generic example just to transmit the idea. I’d like to eliminate one of two rows where ‘correlation’ values are the same while entries (in ‘column1’ and “column2”, respectively) are also the same but in inverted order. For example: in first and second rows the correlations values are the same because regards to same entry_A_6 and entry_A_1 in inverted order in respective columns; I wanna keep only the first row. There are also some cases with 2 pairs of repetition (e.g. correlation 7.25), that must be transformed to only one row per pair. Please note that rows with unique correlation values (e.g. correlation of 7.01) must be also kept.

column1 column2 correlation
entry_A_6 entry_A_1 7.55
entry_A_1 entry_A_6 7.55
entry_A_9 entry_A_2 7.25
entry_A_6 entry_A_5 7.25
entry_A_5 entry_A_6 7.25
entry_A_2 entry_A_9 7.25
entry_A_6 entry_A_2 7.22
entry_A_2 entry_A_6 7.22
entry_A_9 entry_A_4 7.12
entry_A_4 entry_A_9 7.12
entry_A_9 entry_A_1 7.10
entry_A_1 entry_A_9 7.10
entry_A_8 entry_A_1 7.03
entry_A_9 entry_A_3 7.02
entry_A_6 entry_A_4 7.02
entry_A_4 entry_A_6 7.02
entry_A_3 entry_A_9 7.02
entry_A_1 entry_A_8 7.01
entry_A_6 entry_A_3 6.88
entry_A_3 entry_A_6 6.88
entry_A_8 entry_A_5 6.82
entry_A_5 entry_A_8 6.82
entry_A_10 entry_A_9 6.71
entry_A_9 entry_A_10 6.71
entry_A_9 entry_A_5 6.70
entry_A_5 entry_A_9 6.70

Hi,
You can use a Rule Engine node to create two new columns c1 and c2, with c1 containing the alphabetically sorted higher value from column1 and column2 and c2 the lower one. You can do that with rules like this:

Rule Engine 1:

$column1$ > $column2$ => $column1$
TRUE => $column2$

Rule Engine 2:

$column1$ > $column2$ => $column2$
TRUE => $column1$

Now you have unified the rows and you can use a Duplicate Row Filter node on c1, c2, and correlation to remove duplicates.
Kind regards,
Alexander

2 Likes

Thank you so very much for your answer AlexanderFillbrunn! This has worked perfectly! All the best for you!

2 Likes

Hi,
I am glad it worked. You can make the workflow even more compact by replacing the two rule engines by a Column Aggregator node, which can calculate the min and max for column 1 and 2 in one step.
Kind regards
Alexander

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.