I have a text (actually part of a html) in which there are several <a> tags with href attributes which I need their values (the links) to be extracted.
For example:<ul class="user-results"><li class="user-card"><div class="user-card__content mod-host"><a class="user-card__profile-link" href="/people/ali-irani-98">… (and there are more <li> tags containing links)
I need the value of the href attributes.
How can I do that? Do I need to use regex filter? If yes, what pattern should I use?
Thanks Markus. That worked fine.
But I think there should be a better approach as in this solution, transposing takes too long when one has multiple texts to extract the links.
My first idea was using xpath, but this text is part of a html and the node cannot read it. Do you have any idea to convert it to the correct format so that the xpath node can read it?
In case you want to use the HtmlParser from Palladian, you can apply the following workaround: Convert the input column which holds the string to a binary cell, and use this as input for the HtmlParser, then use the XPath nodes as common.
(the simple reason that the input to the HtmlParser needs to be binary is, that strings are treated as file paths)