Web Scraping datas from a board

Hi everyone,

I would like to extract datas from a website with Knime.

Usually, I used to select manually the datas and I copied and pasted in an excel File. But the result isn’t good.Sometimes, I have cells merging or datas in the wrong column.

So I would like to use Knime to extract datas from a Website’s board with respecting the columns.

It will be very nice if someone can explain me how to do this.
It’s something I’ve always wanted to do but I don’t have very great skills in programming.

Thank you.

Hi @Grayfox,

Have you tried Selenium nodes? The Table Extractor node is a great node to catch tables in web pages.

:blush:

4 Likes

Hi;
There are many ways to do this and you can definitely do it on knime.

But;

  • from many different types of websites? you will receive data.
  • the background structure of the source website from which you want to receive data
    xml, json, api? …
  • Palladian nodes offer solutions in many issues and you can do workflows.
    but if you are looking for a solution for every resource you can definitely do it with selenium nodes.
2 Likes

Thank you very much, It’s working. I have extract a table from a website with using the Selenium nodes. I specified the class name and I get the entire table with the correct data.

On the other hand, it’s too bad what it’s not free. I have a one month free trial.

With Selenium nodes (I use it) you can do almost anything a human can do in the browser and run it programmatically.

  • password and username login
  • open browser close
  • open and close the window in the browser
  • upload file, download file …

In summary, you can automate and run anything a person can do.

2 Likes

I had not the pleasure of using Selenium yet… but with Palladian nodes it’s pretty straight forward to get the the web page, use Xpath to parse the relevant section of it containing data (basic understanding of HTML helps).

As umutcankurt points out if the site use JSON you can get data directly

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.