Hi,
I try to extract the content of websites with the content extractor. The workflow is as follows (also attached):
Table creator --> httpRetriever --> htmlParser --> Content Extractor --> Document Data Extractor --> Column Filter --> Document Viewer
The Table Creator contains the three websites:
http://cordis.europa.eu/project/rcn/110738_en.html
http://cordis.europa.eu/project/rcn/191258_en.html
http://cordis.europa.eu/project/rcn/106271_en.html
The three websites are very simular to each other, regardless the content of one website (http://cordis.europa.eu/project/rcn/106271_en.html) can not be fully extracted by the content extractor (see attached workflow).
Are there any solutions?
Many thanks in advance!
Best
Simon