Directly access data on the web

Hi all,
is there any way to directly access data on the web without necessarily having to download it first?
Here is an example of the data that I would like to access:
https://www.dati.lombardia.it/Sanit-/Strutture-di-ricovero-e-cura/teny-wyv8/data
Thanks!
Best,
Alfredo

Hi @alfroc,

Yes, of course. The easiest way would be to use the Table Extractor node from Selenium nodes.

Or you can use read the page and parse the XML: HTTP Retriever, HTML Parse, XPath .

A similar topic is answered by @umutcankurt here:

:blush:

3 Likes

Thanks a lot, @armingrudd!
Unfortunately, Selenium nodes require a license and finding out the correct XPath query is too complicated in this case…
Best,
Alfredo

@alfroc,

So I know this isn’t quite the answer you are looking for, but if you’re comfortable using Python I recommend using a Python Source node with Selenium loaded that way, and just scrape the table.

I ran into the same issue that you did and just went the Python route.

In order to get the data into your KNIME workflow you need to scrape the HTML table and convert it into a Pandas Dataframe, then the Python Source node will let you work with the data in your workflow.

I hope this helps.

Resources:


https://selenium-python.readthedocs.io/
https://pandas.pydata.org/

1 Like

Hi @TardisPilot,
many thanks, this looks very interesting as I have already installed the nodes for Python!
Does it also need a Selenium license to work? Do you have any example that can help?
Best,
Alfredo

Hi there @alfroc,

To my understanding you don’t need any license to work with it from Python.

Br,
Ivan

1 Like

Hi @alfroc,

I will try to put something together for you as an example. No, you don’t need a license for Selenium in Python. You just need to install it.

1 Like

Hi @alfroc,

Here is the workflow I built that scrapes that table and outputs it into your workflow. I tested it and it works great. Based on that webpage it generates 100 rows with 24 columns. If you have questions please let me know!

3 Likes

@alfroc,

Hey, I don’t know if you saw, but that website lets you download the full dataset as a .CSV or in other file formats.

Also, (since I don’t know Italian), when I built the scraper I didn’t realize there was more than one page of data, so the scraper just grabs the first 100 rows.

It would be easier to simply just download the entire dataset that they provide from the site. But if you’re curious it is possible to scrape all of the pages too. Let me know if you have any questions/concerns.

1 Like

You can grab the file URL and download it in KNIME.

:blush:

2 Likes

Hi @TardisPilot,
thank you very much for your workflow! I’ll try for sure, just for curiosity…
@armingrudd of course, I know I can download the dataset, that’s what I usually do!
My question was to know if this manual step can be skipped…
Best,
Alfredo

I mean downloading the file automatically in KNIME

:blush:

@armingrudd is right, you can skip the manual step and do it automatically in KNIME.

You can look at the following example:

You can find the workflow here:

5 Likes

Hi @oole, that’s great, thank you very much, this is what I wanted! I owe you a pint… :wink:
Best,
Alfredo

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.