Regex by RegexGenerator not working in Regex Extractor

Hi, I’m using this tool to generate complex regex patterns: http://regex.inginf.units.it/

One of the rows is the following:

<p class="author"><a href="./viewtopic.php?p=10039785#p10039785"><img alt="Nota" height="9" src="./styles/prosilver/imageset/icon_post_target.gif" title="Nota" width="11"/></a>por <strong><a href="./memberlist.php?mode=viewprofile&amp;u=402553">cat walk</a></strong> el Mié Feb 22, 2012 5:09 pm </p>

And I’m using the following regex:

(?<=\w[^<][^/]\w\w\w\w\w[^;]>)[^<]++

It doesn’t work. Any idea why? I’m trying to get the username, and there are so many formats in my dataset and regex is hard.

Hi @iagovar,

What is the username in your example? And do the other rows have the same format?

:blush:

The username would be cat walk in my example, and no, not all rows have the same format.

This dataset comes after scrapping a forum, and later I found that my CSS Selectors didn’t pick usernames for all rows because the code was not the same for all messages.

So what’s common in all strings? Can you provide a few more examples?

Here is a random sample of 100 rows: https://ethercalc.org/7jg1iw0irlf1

Use this expression in the String Manipulation node:

strip(regexReplace(regexReplace($Col0$, "<[^>]+>", "&!111;"), ".*?por\\s*(?:&!111;)+(.*?)&!111;.*", "$1"))

Replace “Col0” with the column name which contains the strings.

:blush:

3 Likes

It Worked! :slight_smile:
Thank you so much.

What tool would you recommend for someone without regex experience? I need to extract dates too, and I will need regex with other datasets.

Also, do you know why the regex by regex generator didn’t work?

1 Like

You can learn about regular expressions here. Also you can always ask your questions here in KNIME forum.

Try a bit on your own and come back to me if you need help. That’s how you can learn.

I think in your case human eyes were needed! :wink:

:blush:

2 Likes

I will. I tried a bunch of times but it becomes difficult to remember.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.