Hi, I’m using this tool to generate complex regex patterns: http://regex.inginf.units.it/
One of the rows is the following:
<p class="author"><a href="./viewtopic.php?p=10039785#p10039785"><img alt="Nota" height="9" src="./styles/prosilver/imageset/icon_post_target.gif" title="Nota" width="11"/></a>por <strong><a href="./memberlist.php?mode=viewprofile&u=402553">cat walk</a></strong> el Mié Feb 22, 2012 5:09 pm </p>
And I’m using the following regex:
It doesn’t work. Any idea why? I’m trying to get the username, and there are so many formats in my dataset and regex is hard.
What is the username in your example? And do the other rows have the same format?
The username would be
cat walk in my example, and no, not all rows have the same format.
This dataset comes after scrapping a forum, and later I found that my CSS Selectors didn’t pick usernames for all rows because the code was not the same for all messages.
So what’s common in all strings? Can you provide a few more examples?
Here is a random sample of 100 rows: https://ethercalc.org/7jg1iw0irlf1
Use this expression in the String Manipulation node:
strip(regexReplace(regexReplace($Col0$, "<[^>]+>", "&!111;"), ".*?por\\s*(?:&!111;)+(.*?)&!111;.*", "$1"))
Replace “Col0” with the column name which contains the strings.
Thank you so much.
What tool would you recommend for someone without regex experience? I need to extract dates too, and I will need regex with other datasets.
Also, do you know why the regex by regex generator didn’t work?
You can learn about regular expressions here. Also you can always ask your questions here in KNIME forum.
Try a bit on your own and come back to me if you need help. That’s how you can learn.
I think in your case human eyes were needed!
I will. I tried a bunch of times but it becomes difficult to remember.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.