Content Extractor


I try to extract the content of web pages using the Palladian Content Extractor, but what ever I try I just get out the headline.

Does anyone have the same problem or any suggestion what I am doing wrong?

Thanks in advance, Karl.

Hi Karl,

the ContentExtractor node appends a TextDocument cell, which behaves somewhat different than a ordinary StringCell. Use a Document Data Extractor node (available in the Text Processing section) and configure it to append the document's text as dedicated string column. I'm attaching an example workflow.


Hi Philipp,

Thank you very much for your help!
Now I understand how to use the ContentExtractor - my first test worked properly :-)

Best wishes, Karl.