How to identify ‘similar’ columns?

Dear KNIMErs,

I’m working on a workflow which should:

  1. Extract sets of columns where each set is made of 2 columns (‘Column Name’ and ‘ Prefix1 Column Name Check’) or 3 columns (‘Column Name’ and (‘Prefix1 Column Name Check’ or ‘Prefix1 Column Name Check’)) where:
  • ‘Column Name’ stands for ‘lead’ column header
  • ‘Prefix Column Name Check’ stands for check columns associated with ‘lead’ column header. It means that column nae here is made of: prefix (various prefixes, length=5 characters) + ‘lead’ column header + ‘ Check’ string (6 characters).
  1. Extracts rows with ‘1’ value in ‘Check’ columns for each set of columns.
  2. Show all extracted sets of columns and all extracted rows in summary Excel sheet.
  3. Show each extracted set of columns and extracted rows for that set in a separate Excel sheet, 1 sheet per each set of columns.

Sample input dataset looks like this:

Real dataset contains 600+ columns.

Expected results should look like these:

all in a single Excel file.

My current challenge is: how to select columns for sets of columns?

I’m trying to apply:

  • ‘Column List Loop Start’ to proceed through all columns – this works ok.
  • ‘String Manipulation (Variable)’ and ‘Variable to Table Column’, and other nodes to have ‘Column Name’ and ‘Column Name Check’ column headers – these work ok.
  • ‘Reference Column Filter’ node to extract set of columns. This works for ‘Column Name Check’ column, but I don’t know how to move from ‘Column Name Check’ to ‘Prefix Column Name Check’.

At this moment, end of my workflow looks like this:

I’ve found the workflow prepared by @aworker and shared here however either that is not similar enough to my challenge or I don’t know how to modify that to meet my expectations.

Could you please suggest any solution for my current challenge?
Of course any suggestions on the whole solution would be more than welcome.

I might be offline during the weekend, however I am grateful for any help shared in the meantime.
Happy KNIMEing,

This challenge is resolved now. The key points are:

  1. Using ‘Column Name’ to create rules that are used for…
  2. … ‘Rule-based Row Filter (Dictionary)’ node for filterring out expected columns.
    Building the rest of workflow now.

Happy KNIMEing!


Thanks for posting about what worked for you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.