Read table from website

Hello,

I am new to Knime and I ran into an issue. I try to extract a table from this website: Audi Europe Sales Figures

Unfortunately the whole table is put into one single cell. I want to scrape other tables too ( number of rows can deviate).

I would appreciate your ideas! Thank you in advance!

image

I tried something like this…

Hi Max98,

I Suggest to have a look here about how to extract tables with the Selenium Nodes:

–Philipp

1 Like

Thanks a lot! I changed my approach now. :smile:
At the moment I am reading 2 titles and 2 tables from this website.

How can I loop this workflow, so that I can read the other pages (same structure) from my excel reader?

Kind regards,
Max

1 Like

Hi Max,

looks good! :+1:

For accessing multiple pages, the idea with the loop is right. You would typically connect the red flow-variable port to a Navigate node (which follows the Start WebDriver) and which sequentially navigates to all the pages you want to process.

Configure the “Navigate" node so that the URL is taken from the flow variable.

Does this help?

Best,
Philipp

1 Like

Thank you very much! One (hopefully) last question. Where can I put the loop end? If I put it directly behind the navigate node, it only lists the URLs (as it should).

I want to create one excel file per loop, but the excel writer has no output for the loop end. So where do I end my loop?

Maybe a stupid question… :roll_eyes:

Kind regards,
Max

Hey Max,

Then you’ll need to put it after the Excel writer. As you notice, this one doesn’t have a port, so here’s what to do:

  1. Add a Variable Loop End node to the workflow

  2. Drag a connection from the upper right corner of the Excel Writer node (there’s an invisible port at that point) to the red circle input of the loop end node.

  3. If previous step sounds like black magic, you can right click the Excel Writer and select the menu item “Show flow variable ports”. This is the same as before, but makes things more obvious:

Hope this helps!

–Philipp

Thanks a lot! Now it works! My only problem, which still occurs is, that the first table on page 2 has the table id=“table_1” instead of “table_2”. Therefore my Find Elements node doesn’t work anymore… Still something to figure out… :smile:

Kind regards, Max

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.