RegEx Help needed:Extract parameters from urls

I am new to KNIME as well as to REGEX, but I am fighting my way through. I need help though, because this dirves me crazy and I do not know where my mistake is:

  1. I have 5 Mio. urls in a csv uploaded

  2. I want to filter out all urls that contain parameters that can be identified by containing /?

  3. I checked the REGEX and it should work like that:

  4. I am using the rule based Row Filter that way:

$URL$ MATCHES “/(./?.)” => FALSE
TRUE => TRUE

(include TRUE matches)

However, they are NOT identified. I can still find thousands of urls, none are filtered. What am I doing wrong?

Can you explain how you derive the regex in step 4 from the one in step 3?

I came up with this idea, split the columns and the ones that do split are the ones you want. I tried it with the RegEx Filter and Documents but the filter does not work the way I would expect it to work.

You could skip the Strings to Documents part.

kn_example_regex_url_part.knwf (24.9 KB)

1 Like