Regex Parsing - How??

Gavin_Attard · May 14, 2020, 9:08pm

HI All

So firslty, i’m new to Knime, and i am coming form teh perspective of an Alteryx power user.
So i while on teh face of it, it appears Knime can do most of what Alteryx can do, certain really simple things are very difficult. Whether this is a nomenclature issue (node naming etc…) or simply missing features i am as yet unsure. I’m hoping that the good people in here can help point me in the right direction and help me get over my learning curve.

I am trying to do something simple.

I have a file full or urls and i need to extract a list of all the query keys found.

I am using the regex split node to extract the query keys using regex tried and tested on regex101

Now in Alteryx i simply use a regex tool set to parse, enter the string below and it automatically parses each capture group into a column.
(?[^=]+)(&[^=]+)

In knime i get a mysterious warning message: 1589 input string(s) did not match the pattern or contained more groups than expected

Can anyone decipher or point me in the right direction?

thanks

Gavin

AnotherFraudUser · May 14, 2020, 9:38pm

Hi Gavin,

which node you are using with your regex?
could you provide an example Workflow with the regex and node configuration you are using including an url which results in this error?
The URL does not have to be real

izaychik63 · May 14, 2020, 10:58pm

You can look here

Also, for switching for Alterix

qqilihq · May 15, 2020, 6:04am

Also check out this thread:

Gavin_Attard · May 15, 2020, 6:42am

HI.
Here is an example workflow. Thanks

Extract Queries.knwf (7.8 KB)

Gavin_Attard · May 15, 2020, 6:47am

Thanks, i’ll have a look at the book.
Checked out the other topic link too. The whole point of tools like this is that they should offer uncomplicated ways to carry out this sort of thing, to have to learn Ruby to do a simple regex to column parse seems to defeat the point.

qqilihq · May 15, 2020, 7:06am

Using the Regex Extractor from Palladian this is quite easy (simple proof of concept with your data)

I used the following regular expression:

[?&](?<key>[^?&=]+)=(?<value>[^?&=]+)

The modified workflow is on my NodePit Space:

Gavin_Attard · May 15, 2020, 7:09am

Thanks @qqilihq for setting me on teh path to success.

Ok So here is what i learnt.
Out of the box Knime does not have the tool that can adequately carry out this processing.

However there is an extension pack you can download which has loads of useful nodes, in particular a regular expression node that does exactly what you would expect and also has a built in preview function.

The extension is Palladian - Highly recommed you install this.

Gavin

qqilihq · May 15, 2020, 7:11am

Happy to hear. Btw; I am the main author of Palladian. So if you have any questions or feedback – keep it coming!

Gavin_Attard · May 15, 2020, 9:06am

Love this, will certainly have a good look through all the tools.

I need to get over my initial spoilt princess phase as i transfer my habits from Alteryx to Knime.

AnotherFraudUser · May 15, 2020, 12:47pm

Hi Gavin,

Thanks for your example

As far as I understand your problem this is what you want?
Basically I think the regex split node expects you to give a group for every parameter you expect (if a group returns more than 1 match you get your error :))
As well to successfully split your regex has to match the whole url

Attached an example how to solve this within the regex split node or with the string replacer node - combined with the cell splitter (which i prefer)

KNIME_project56.knwf (21.2 KB)

*however qqilihq solution seems to fit your needs quite well as it actually gets the matches for each group - not quite sure why KNIME does not consider the matches from groups - would make it easier if it did

system · May 22, 2020, 12:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.