Filter rows by matching substring

norm762 · February 3, 2022, 3:32pm

I’m new to knime because I’m trying to consolidate into one environment so I can stop bouncing back and forth between R and Python. I’m trying to merge data together from a dozen different data reports and I need a node that allows me to quickly filter rows by substring to see if a value exists so I can figure out why I’m end up with missing values after a series of joins.

For example,
If I have strings that look like: “*****ABCD”.

I need to filter a table down rows that match “ABCD”.

How do I set up a node to run a quick match on each data source to see if it’s really missing or if I have some more data cleaning to do?

I can’t seem to make any regex work in Knime, so I’m assuming there’s some syntax that I’m missing.
Is there a regex guide specific to Knime floating around out there?

Thanks for your time!

elsamuel · February 3, 2022, 3:50pm

Welcome to the forum @norm762.

Can you share some actual strings and/or your workflow? It seems to me that any of the row filters should work.

The simplest one would be the Row Filter node with pattern matching selected and the expression *ABCD

bruno29a · February 3, 2022, 3:50pm

Hi @norm762 and welcome to the Knime Community.

Knime uses regular regex, so it would not have any regex guide specific to Knime.

May be you can share what you did and then we can see what the issue is.

Daniel_Weikert · February 3, 2022, 6:25pm

Just be careful with the double escape backslashes. In Python this is better handled.
br

norm762 · February 3, 2022, 7:36pm

Got it thanks! I kept thinking an asterisk would be interpreted literally. It didn’t occur to me that’s what the checkbox for wildcards was referring to.

system · May 4, 2022, 7:36pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.