I am new to KNIME as well as to REGEX, but I am fighting my way through. I need help though, because this dirves me crazy and I do not know where my mistake is:
I have 5 Mio. urls in a csv uploaded
I want to filter out all urls that contain parameters that can be identified by containing /?
I checked the REGEX and it should work like that:
I am using the rule based Row Filter that way:
$URL$ MATCHES “/(.
/?.)” => FALSE
TRUE => TRUE
(include TRUE matches)
However, they are NOT identified. I can still find thousands of urls, none are filtered. What am I doing wrong?
Can you explain how you derive the regex in step 4 from the one in step 3?
I came up with this idea, split the columns and the ones that do split are the ones you want. I tried it with the RegEx Filter and Documents but the filter does not work the way I would expect it to work.
You could skip the Strings to Documents part.
you could convert the string to a document and use RegEx Filter like
@SimonS suggested. In your case a Rule engine might also work but using Documents gives you access to more functions.
The comparison between the documents is maybe not the most elegant way. Someone with more experience in Text processing might suggest another approach.
And I always found the site https://regex101.com quite useful to test regular expressions. Sometimes you would have to also test a few strange cases that might…
kn_example_regex_url_part.knwf (24.9 KB)