Searching between two tables by a column of regular expressions

I have two tables of very different geometry. Table A has existing data including a column with string data in it that I’m interested in (along with over 100 other columns). Table B has only two columns, one is a Regular Expression that might match with 1 or more strings from the column in Table A that I mentioned, and the other column in Table B has information I want to add to table A when a string from table A is matched by the Regular Expression from table B.

Ordinarily I might use Joiner to bring these two tables together, but Joiner doesn’t have the option to match by Regular Expression. Do I need to instead setup a loop (maybe a chunk loop?) to go through table B and bring the data over that way? Or is there another way to do this?

If it makes a difference Table B has ~400 rows while Table A has ~700 rows. My expectation in this case if a RegEx from Table B matches multiple rows of Table A that we would end up with multiple rows going into the product table (rather than a fancy merge here).

thank you!

Well yes that is possible although I had to employ the help of R. I think it is also possible to create a loop in KNIME.

The whole thing rests on the libraries fuzzyjoin and the example described here: https://rdrr.io/cran/fuzzyjoin/man/regex_join.html

image

Edit: if more than one Regex matches a double entry is created (one would have to deal with that later)

image

kn_example_regex_fuzzyjoin.knwf (1.1 MB)

1 Like

Thank you for your suggestion!

I was looking for a solution that I could deploy using only KNIME tools (without incorporating any additional scripting), and eventually found a solution using String Manipulator and Rule-based Row Filter.

Briefly, as long as the RegEx column had well-formed RegExs, I just needed to use String Manipulation to add “.*” to each end of the value in the column (other cases may benefit from other options to tune the specificity; in my case I knew the RegEx would most likely match somewhere in the middle).

After that, the important options in Rule-Based Row Filter is to use “MATCHES”, rather than “LIKE”. Hence if my merged table has a column “LongName” and a column “RegEx” my Rule-Based Row Filter Expression is now
$LongName$ MATCHES $RegEx$ => TRUE
And this returns the matches I want.

thank you