Help with get data (table) from webpage

Hello friends,

I would like some help extracting a table from a website.
I am not familiar with Knime for extracting data from websites.

The only tool I have used is POWER BI, which automatically identifies the data.
It makes the task very simple.

I like to learn from Knime.

If possible, could someone create a workflow for extracting the table from this site? Ranking de Fundos Imobiliários | FundsExplorer

Based on the workflow, I can study how it was done.

Table

----------------------Tried--------------------------------
I tried using this component “Table from XHTML,” but it did not return the table.
I believe I would need to customize it.

https://hub.knime.com/alexanderfillbrunn/spaces/Public/Components/Table%20from%20XHTML~CiE7hTN611IQMXBX/current-state

Hi Felipereis50,

You can do it with the Table Extractor node:

Best regards,
Philipp

1 Like

Great

I’m configuring node Pit to download.
I’ll come back in no time!!!

Hi friend,

I think those “selenium nodes” are paid, cn you confirm?

1 Like

Yes. There’s a free trial, afterwards there’s paid licenses.

1 Like

What a pity,

I’ll wait to see if someone can help me using the available nodes.

Try this for some ideas:

Hi friend

Well…
I’m not, even close, to be a student of HTML.

My steps

  1. went to google and use the inspection HTML

  1. Searched for the name “table” to find a clue to “where to start” and I found all the values from HTML (table) :slight_smile:

If I open the “child” , I can see all the code for the first value, and so on…

  1. I tried to copy the Xpath or Full XPath e past to the node Xpath Knime

But no success

Next Step (help)

I’ll need some help for the code that I will have to use into Xpath.

Can you share the workflow you have?

Of course

here
Funds_state.knwf (76.8 KB)

But is very very simple. I have nothing.
image

---------------My Experience with Xpath-----------------
I have a little experience with Xpath from reading XML.
I’ve created this workflow and achieved the desired result.
But with HTML I’m lost. :frowning:

@Felipereis50 the nodes do work although there is a problem that the website does not seem to provide the numbers under certain circumstances. So the cell are always empty.

Here is a Python code trying to deal with that. It is built so as to extract all tables into Parquet files in a directory you can specify.

1 Like

Thank you very much, mLauber.

I tried starting the installation of CONDA, but I couldn’t do it.
I’m using a corporate computer, and there are restrictions on installing programs.
I won’t be able to complete it.

Based on your analysis, Xpath wouldn’t be ideal, correct?
I searched on YouTube and found some tutorials on how to perform web scraping, specifically for the site I mentioned, and I only found examples using Python.
Many of them use Python libraries. (Beautiful soup)

Perhaps I could replicate it using Python Node based on the tutorial, but if I need to install any library on my computer, I won’t be able to proceed.

In any case, if I can’t manage it, I’ll have to resort to using Power BI as a source to capture the table.

Thank you in advance, and I’ll consider the thread closed.

@Felipereis50 having the ability to install software (namely Python) might be crucial to actually using analytics tools. In this case the Webpage Retrieval and so on do work in principal, but the data itself seems to be dynamically provided by some sort of sub-page which is not in itself accessible.

In principal you can try a cascade of XPath and JSON Path to extract the elements you want.

The " Table from XHTML" component does search for the (first) table element //table[1] and then for the data rows and so on if they are there. The path syntax can be somewhat confusing first but with a little trial and error you can manage … given that the data is actually there in the retrieved html/xml document.

1 Like

Hey @Felipereis50,

have you considered using the KNIME Webinteraction Nodes?
They are developed by KNIME and therefore free to use.


I have attached the workflow that I used.

Kind regards Ricci :slight_smile:

webInteraction_example.knwf (54.4 KB)

2 Likes

Hi friend

I’m trying, but now, I’m getting an error from Web Interaction Start

I’m looking from some help in the forum.

image

I put the path but no success
image

Execute failed: Message: Could not locate chromedriver at path: C:\Program Files\Google\Chrome\Application\chrome.exe

I managed using firefox.

Result: Thanks.
@ricciV1
Worked perfectly.

Thank you, @mlauber71, for the support, but the tip from @ricciv1 was simpler.

1 Like

@Felipereis50 never have used the KNIME web interaction nodes but will keep them in mind. They seem to be more elegant than a python code :blush:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.