I have a column that contains the forum message content. This content has more than one URL inside, be it because they post images, urls to other sites and so on.
My intention is to extract all those URLs and put em in a new table.
I’m not sure how to do this. The URL extractor palladian node only extracts one of them, and, as I said, most forum comments contain more than one URL.
But there’s more than one URL fro each cell, that’s the problem. Yes it extracts more than one url for the whole column but not more than one for each cell, that’s the reason why I want a new table, because the number of rows would be different.
You could avoid this e.g. with a chunk loop where you process items row-wise and then group them back by grouping the URLs to a collection cell. This way the row count remains the same.
Palladian 2.0 has a new Regex Extractor node which (among others) has a preset for extracting URLs. It allows to output extraction results in separate rows (and will give a back-reference to the RowID from source), only to extract the first occurrence, extract a “Collection Cell”, or even a fine-grained JSON object which contains the results including the offsets within the input string.
More details about the Palladian 2.0 release are available here: