Extract table from html without selenium nodes

vincent_m · December 16, 2020, 3:57pm

Hi everyone,

I’m currently trying to extract tables from this website : Etablissement ABATTOIRS DU GEVAUDAN à ANTRENAS (48100) sur SOCIETE.COM (50760892500032)

I’m trying to perform something with table creator → http retriever → html parser → and I’m lost.

The fact is I think I can’t directly use the XPath node, cause from one company to another on the website, the DOM structure is not exactly the same, but it always contains tables with company informations I want to get.

I also know it would be really easy with selenium nodes, but I actually can’t use it.

So I’m looking for a way of doing this task only with Palladian.

Do you have an idea about this ?

Thank you very much,

Vincent

qqilihq · December 17, 2020, 5:41pm

Hi Vincent,

as you stated, the Table Extractor from the Selenium nodes can do this very easily.

If this is not an option, you could alternatively build the extraction yourself using appropriate XPath queries which you will need to chain for processing the table row wise, then each column and then build the table structure in KNIME.

Bonne chance!
Philipp

system · April 21, 2023, 9:38pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.