RegEx Help needed:Extract parameters from urls

Phishman81 · December 19, 2018, 11:01am

I am new to KNIME as well as to REGEX, but I am fighting my way through. I need help though, because this dirves me crazy and I do not know where my mistake is:

I have 5 Mio. urls in a csv uploaded
I want to filter out all urls that contain parameters that can be identified by containing /?
I checked the REGEX and it should work like that:
ruby1059×514 30.5 KB
I am using the rule based Row Filter that way:

$URL$ MATCHES “/(./?.)” => FALSE
TRUE => TRUE

(include TRUE matches)

However, they are NOT identified. I can still find thousands of urls, none are filtered. What am I doing wrong?

Aswin · December 19, 2018, 3:44pm

Can you explain how you derive the regex in step 4 from the one in step 3?

mlauber71 · December 19, 2018, 11:24pm

I came up with this idea, split the columns and the ones that do split are the ones you want. I tried it with the RegEx Filter and Documents but the filter does not work the way I would expect it to work.

You could skip the Strings to Documents part.

kn_example_regex_url_part.knwf (24.9 KB)