Using a "Reference Table" to add data in my loaded file

Hi Guys,

hopefully you can help me again. :slight_smile:

Example Case:

image

As you can see. I want to add data in my orginal file via a reference table (e.g. via table creator). Search criteria is only a part of a string.

The rule must be something like that.

If you find “*green*” in Column A add “green” in column B and “101” in column C
If you find “*black*” in Column A add “black” in column B and “102” in column C
If you find “*white*” in Column A add “white” in column B and “103” in column C
If you find “*blue*” in Column A add “blue” in column B and “104” in column C
If you find “*yellow*” in Column A add “yellow” in column B and “105” in column C
If you find “*red*” in Column A add “red” in column B and “106” in column C

Hopefully you have I idea how can I handle it.

Thanks in advance!

BR

Andre

Hi @AndreP , depending on just how much data you’d really have, this is one possible solution. It involves joining every row of your reference data with every row on you sample data table using a cross join.
After that, a rule can be used to say for each row "is the ‘standard’ column contained in the text of ‘Column A’. Filter out those rows that don’t match, and then filter and rename the columns that you have remaining.

KNIME_ref_data_partial_match.knwf (15.5 KB)

There are variations that could be done on this workflow in terms of how “Column B” gets created (I do it in the Column Expressions node, but the Column Expressions could simply return a “Y/N” flag, and then you’d filter on that and rename Standard to Column B. )

hope that helps

I also found an alternative using somewhat more “exotic” :wink: nodes…

I’d imagine this might work be less resource hungry if you have large data sets (cross joins can get very memory hungry if you have large numbers of rows in both tables). I’ve not played with Documents and Dictionaries in Knime before so there might be some shortcuts to this, but it’s a start!

KNIME_ref_data_partial_match 2.knwf (36.5 KB)

2 Likes

Hello there,

one option is also to construct rules for Rule Engine followed by Cell Replacer. Here is example:
KNIME_ref_data_partial_match_ipazin.knwf (31.8 KB)

Br,
Ivan

2 Likes

@ipazin… Oh I like that !! :slight_smile:

2 Likes

My proposition would be to just use the Rule Engine (Dictionary) node. Eventually it’ll work just like ipazin’s solution, but might be more intuitive.

KNIME_ref_data_partial_match_g472.knwf (11.1 KB)

3 Likes

Hello @G47_2,

even better as you don’t have to use GroupBy node and convert rules to flow variable. And additionally you can simplify expression in String Manipulation node and define output column in Rule Engine (Dictionary) node. Nice one :+1:

Br,
Ivan

3 Likes

One more addition I’d like to propose: the Rule Engine will read the rules from top to bottom, so if more than one rule could apply, only the first rule would be applied. Let’s say I’d add “black & yellow by wiz khalifa” to the sample table, you’d want to see matches for “black” and “yellow”. You can use a loop that feeds one rule after another into the Rule Engine, so each rule will be treated equally, and remove rows without any matches.

If you want to do math with the assigned values (e.g. calculate sum or mean of each group via a GroupBy node), you’d probably want to change the String Manipulation node’s content to join("$Column A$ LIKE \"*", $Standard$, "*\" => ", $Code$), so the result would be an integer. That’s what I’ve done in the workflow below:

KNIME_ref_data_partial_match_g472_loop.knwf (19.6 KB)

3 Likes

@G47_2 , that’s great. I added a slight mod to yours [your first post], for the benefit of the OP to get the full result back as per the Column A,B,C but both your and @ipazin’s workflows have provided new insights, which is what this is all about! :slight_smile:

image

image

KNIME_ref_data_partial_match_g472_2.knwf (13.3 KB)

1 Like

Hi guys,

thank you all for your answers. :slight_smile:

I will check it tomorrow and if you like I will give you feedback.

BR

1 Like

Thanks guys!

It’s works perfect for me.

BR

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.