How to insert data tables from web pages?

Hello all,
I didn’t find a convenient way to access and insert data tables from web pages.
Following, two simple examples:

Table with Nutrition Facts
Historical Nasdaq data

It is quite easy to import such tables in Excel but I would like to have them in KNIME in order to transform and combine several tables.
I’m sure, there must be an easy way!?
I’d highly appreciate any suggestions and support.
Many thanks in advance!

You can use the Table Extractor from the Selenium Nodes for this:

Some more background and example:

1 Like

Hi @qqilihq,
thanks a lot for your quick reply!
This totally looks like the solution I was looking for.
However, since I’m using KNIME for private and not for commercial purpose I don’t have access to the selenium nodes.
And the annual licence fee is quite high so I’m afraid, this wouldn’t be a realistic option for me.
Are there any alternative nodes / workflows free of charge for this task?
Thank you!

Hi MoBa,

if it’s a one-time project, there’s a free 30-day trial of the Selenium Nodes which you can use. It gives you access to all functionality, and there’s no obligations or subscription involved. Please feel invited to give it a try :slight_smile:

Alternatively you can of course also do this “by hand” which means replicating the convenience of the Table Extractor node. The web page which you show could also be scraped using the simple HTTP Retriever and HTML Parser nodes from Palladian, which is entirely free of charge for free KNIME versions:

Use the HTTP Retriever to download the webpage in questions, and the HTML Parser to build a clean DOM model of the HTML page. To extract the table structure you’d then need to employ some XPath nodes to stepwise transform the <table> structure into a KNIME table:

This is totally doable, especially if you tailor it to one specific type of table / website. What the Table Extractor from the commercial Selenium Nodes does is, provide all this as a convenient, ready to use node to save you these manual labor.

Fingers crossed!

–Philipp

5 Likes

Hi Philipp,
many thanks for all your helpful input!
Since I’d like to update the web data on a regular basis, I’m afraid the 30-day free trial for the Selenium Nodes wouldn’t help on the longterm.
Therefore, I will definitely give it a try with the proposed option using Palladian Nodes.
Fingers crossed :slight_smile:

2 Likes

Use a python source node and read the html file

import pandas as pd
df = pd.read_html("https://www.fda.gov/food/new-nutrition-facts-label/how-understand-and-use-nutrition-facts-label")
output_table = df[0]

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.