I’m working on a workflow which should:
- Extract sets of columns where each set is made of 2 columns (‘Column Name’ and ‘ Prefix1 Column Name Check’) or 3 columns (‘Column Name’ and (‘Prefix1 Column Name Check’ or ‘Prefix1 Column Name Check’)) where:
- ‘Column Name’ stands for ‘lead’ column header
- ‘Prefix Column Name Check’ stands for check columns associated with ‘lead’ column header. It means that column nae here is made of: prefix (various prefixes, length=5 characters) + ‘lead’ column header + ‘ Check’ string (6 characters).
- Extracts rows with ‘1’ value in ‘Check’ columns for each set of columns.
- Show all extracted sets of columns and all extracted rows in summary Excel sheet.
- Show each extracted set of columns and extracted rows for that set in a separate Excel sheet, 1 sheet per each set of columns.
Sample input dataset looks like this:
Real dataset contains 600+ columns.
Expected results should look like these:
all in a single Excel file.
My current challenge is: how to select columns for sets of columns?
I’m trying to apply:
- ‘Column List Loop Start’ to proceed through all columns – this works ok.
- ‘String Manipulation (Variable)’ and ‘Variable to Table Column’, and other nodes to have ‘Column Name’ and ‘Column Name Check’ column headers – these work ok.
- ‘Reference Column Filter’ node to extract set of columns. This works for ‘Column Name Check’ column, but I don’t know how to move from ‘Column Name Check’ to ‘Prefix Column Name Check’.
At this moment, end of my workflow looks like this:
I’ve found the workflow prepared by @aworker and shared here https://forum.knime.com/uploads/short-url/lKPtCDyLin0koSJyrlop2HabZMc.knwf however either that is not similar enough to my challenge or I don’t know how to modify that to meet my expectations.
Could you please suggest any solution for my current challenge?
Of course any suggestions on the whole solution would be more than welcome.
I might be offline during the weekend, however I am grateful for any help shared in the meantime.