Using a "Reference Table" to add data in my loaded file

AndreP · May 17, 2021, 6:30pm

Hi Guys,

hopefully you can help me again.

Example Case:

As you can see. I want to add data in my orginal file via a reference table (e.g. via table creator). Search criteria is only a part of a string.

The rule must be something like that.

If you find “*green*” in Column A add “green” in column B and “101” in column C
If you find “*black*” in Column A add “black” in column B and “102” in column C
If you find “*white*” in Column A add “white” in column B and “103” in column C
If you find “*blue*” in Column A add “blue” in column B and “104” in column C
If you find “*yellow*” in Column A add “yellow” in column B and “105” in column C
If you find “*red*” in Column A add “red” in column B and “106” in column C

Hopefully you have I idea how can I handle it.

Thanks in advance!

BR

Andre

takbb · May 17, 2021, 8:00pm

Hi @AndreP , depending on just how much data you’d really have, this is one possible solution. It involves joining every row of your reference data with every row on you sample data table using a cross join.
After that, a rule can be used to say for each row "is the ‘standard’ column contained in the text of ‘Column A’. Filter out those rows that don’t match, and then filter and rename the columns that you have remaining.

KNIME_ref_data_partial_match.knwf (15.5 KB)

There are variations that could be done on this workflow in terms of how “Column B” gets created (I do it in the Column Expressions node, but the Column Expressions could simply return a “Y/N” flag, and then you’d filter on that and rename Standard to Column B. )

hope that helps

takbb · May 17, 2021, 9:59pm

I also found an alternative using somewhat more “exotic” nodes…

I’d imagine this might work be less resource hungry if you have large data sets (cross joins can get very memory hungry if you have large numbers of rows in both tables). I’ve not played with Documents and Dictionaries in Knime before so there might be some shortcuts to this, but it’s a start!

KNIME_ref_data_partial_match 2.knwf (36.5 KB)

ipazin · May 18, 2021, 8:19am

Hello there,

one option is also to construct rules for Rule Engine followed by Cell Replacer. Here is example:
KNIME_ref_data_partial_match_ipazin.knwf (31.8 KB)

Br,
Ivan

takbb · May 18, 2021, 9:03am

@ipazin… Oh I like that !!

G47_2 · May 18, 2021, 9:40am

My proposition would be to just use the Rule Engine (Dictionary) node. Eventually it’ll work just like ipazin’s solution, but might be more intuitive.

KNIME_ref_data_partial_match_g472.knwf (11.1 KB)

ipazin · May 18, 2021, 9:46am

Hello @G47_2,

even better as you don’t have to use GroupBy node and convert rules to flow variable. And additionally you can simplify expression in String Manipulation node and define output column in Rule Engine (Dictionary) node. Nice one

Br,
Ivan

G47_2 · May 18, 2021, 9:58am

One more addition I’d like to propose: the Rule Engine will read the rules from top to bottom, so if more than one rule could apply, only the first rule would be applied. Let’s say I’d add “black & yellow by wiz khalifa” to the sample table, you’d want to see matches for “black” and “yellow”. You can use a loop that feeds one rule after another into the Rule Engine, so each rule will be treated equally, and remove rows without any matches.

If you want to do math with the assigned values (e.g. calculate sum or mean of each group via a GroupBy node), you’d probably want to change the String Manipulation node’s content to join("$Column A$ LIKE \"*", $Standard$, "*\" => ", $Code$), so the result would be an integer. That’s what I’ve done in the workflow below:

KNIME_ref_data_partial_match_g472_loop.knwf (19.6 KB)

takbb · May 18, 2021, 10:01am

@G47_2 , that’s great. I added a slight mod to yours [your first post], for the benefit of the OP to get the full result back as per the Column A,B,C but both your and @ipazin’s workflows have provided new insights, which is what this is all about!

KNIME_ref_data_partial_match_g472_2.knwf (13.3 KB)

AndreP · May 18, 2021, 4:06pm

Hi guys,

thank you all for your answers.

I will check it tomorrow and if you like I will give you feedback.

BR

AndreP · May 19, 2021, 7:36am

Thanks guys!

It’s works perfect for me.

BR

system · November 17, 2021, 7:37pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.