Groupby based on substring match

Hello,
I want to search my table for rows that have 2 conditions.

  1. same value in one column
  2. that contains a substring (in 3’s) in another column. (only one instance required)
    both columns could contain long string lengths.
    eg. AOW1231XO and AOW231QWE would be matched
    AOW1231XO and QQQAQW4321XO would be matched

if these are found add them together with a new column (both strings that contain substring). I tried using the Groupby but not getting it right and also a regex could be another way I think.

Hi @Arnold,

For me it is difficult to understand your question (a substring (in 3’s) ??) . But see the this worklfow Groupby based on substring match.knwf (42.7 KB) . This is they way I understand your question. Hope this helps you in some way. If not provide us with some (dummy) input data and your desired output.


gr. Hans

2 Likes

hello hans,
Thanks for the comment…Substring in 3’s just means

AOW1231XO is in one column of a row but is checked like “AOW” “123” “1XO” if any of these can be found in the value of the other row column.
eg. AOW1231XO in one row will be matched to AON1241XO in another row. From your workflow first table creator should have those three rows in one row.
Hope this makes it better

Take a look if


node will work for you.

1 Like

Hello izaychik63,

this would not work since there is only one input table and then matching is done checking rows of one column.

Hi @Arnold,

welcome to KNIME Community!

To use node @izaychik63 you don’t need to different input tables and not even two different columns. See this example:

In your case for further suggestions some input data set and desired output could really help :wink:

Br,
Ivan

1 Like

this is an example


and the final desired output

Hi @Arnold,

get it know. Seems to me what you are trying to do is a bit tricky but you can try following approach: split you column using Cell Splitter By Position into 3’s string and use Create Collection Column to use above mentioned Subset Matcher on same column.

Second approach could be to create 3’s string from your input column (without Cell Splitter By Position) and then create a separate list (column in another table) from it. Then re-use/modify logic from here: row filter

Br,
Ivan