Merging multiple columns based on a criteria

0nly4phil · October 16, 2021, 6:26am

Hi, I am fairly new to Knime and am struggling to find an efficient way to merge multiple columns based on a criteria.
In the table I have provided a simplistic representation of my data layout. I would like to merge the first set of columns (Col1-4) with a second set of columns (Col1B-4B) and
append the mergers to the table as a third, new set of columns (Col1C-4C).

The criteria is simple: if a row ID contains a missing value in the first set, then the value in the second set of columns is used, otherwise the value in the first set of columns is used.
Column merger does this perfectly. However, the column sets are hundreds and the primary and secondary columns have to be set for each individual merge. I would like to use this workflow for several different initial data sets (same lay-out) so wish to avoid having to configure multiple Column merger nodes each time.
Does anyone have any suggestions?
Thankyou for your time,

Phil

AnotherFraudUser · October 16, 2021, 2:49pm

Hi @0nly4phil,

not quite sure if i managed to solve your exact problem but attached an example workflow for three variations.
One which returns the whole SetB if all columns of SetA are empty else return SetA
The other variation checks Col1 in SetA against SetB, Col2 in SetA against SetB,… and returns for each step the column of setB if setA is missing.
The last variation returns setB is any column in setA is empty

Sets can be defined by a 1or 2 nodes either by manual selecting or wildcards:
grafik

Input:
grafik
Output:

KNIME_project5.knwf (45.4 KB)

0nly4phil · October 17, 2021, 5:54am

Many thanks. I will definitely try your suggestions!

Phil

0nly4phil · October 17, 2021, 6:55am

Hi AnotherFraudUser.
Variation 2 is what I would like to use.
However, the way it is currently configured the output from the loop (collected results) is all set B, without any substitution with values from set A where present which, is the desired result.
I’ll work on it, but perhaps you will be able to see where the problem lies more quickly?
Regards,

Phil

0nly4phil · October 17, 2021, 7:08am

What I mean is the output table you provided in your reply is correct, so it obviously works. However the second variation you provided produces a copy of column set B.
This statement doesnt appear to be respected:
SetACol2 is not missing SetB Col2 is not missing .> return SetA Col2.

Thanks,

0nly4phil · October 17, 2021, 7:43am

That’s all right, got it!
I just switched the order of columns in the column splitter and option 2 works just fine!
Many thanks for sharing. I’ve learnt several important thing today!

AnotherFraudUser · October 17, 2021, 10:13am

Great that it at least got you near the solution

system · April 17, 2022, 10:14pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.