Import table form www

alo · October 31, 2022, 9:46am

Hi,
I have KNIME 4.6.3 and I’m not able to install Selenium node as it’s describde on Selenium Nodes — Download.
I need to import tables form GCC Ports.
I did it in Power BI but I need to do the same in Knime.

My present result is not efficient and elegant. Could you give me any suggestions ?

Container GPS.knwf (35.2 KB)

Marcin

qqilihq · October 31, 2022, 9:59am

Hi,

Why not?

–Philipp

ArjenEX · October 31, 2022, 10:11am

It sounds more like a data issue and perhaps you want to solve it with Selenium?

Getting the actual data is possible with just one XPath node, so technically you don’t need Selenium. Trick here is finding the right path.

In this case, that’s /html/body/div/section/div/div[2]/div/div/div/div[3]/table/tbody/tr[*]/td[1]

whereby the last td[] designates whether it’s the name, code, etc. Set the output to String(Multiple Rows)

Only thing that you have to account for next is the pagination.

alo · October 31, 2022, 11:55am

Hi,
I receive en error.
/html/body/div/section/div/div[2]/div/div/div/div[3]/table/tbody/tr[*]/td[1]

Marcin

alo · October 31, 2022, 11:59am

Hi,
You are great ArjenEX.
Thank you very much for your help.
Best regards,
Marcin

alo · November 2, 2022, 10:49am

Hi,
I can’t close this topic.
On the site www.gccports.com there are doubled information which I removed.
There is another problem. I am not able to receive all the information which I received in Power BI

Container GPS.knwf (30.0 KB)

Could you please give some suggestions ?

Marcin

ArjenEX · November 2, 2022, 12:23pm

The main overview page is selective in what it shows and thus not complete. Hence you’re missing records.

I would suggest starting an outer loop based on the search parameters that you can pass along in the url and then a dynamic inner loop to go through all the pages.

These are the parameters from the search whereby c is the country id and per_page is the page number. For example:

https://www.gccports.com/ports/latitude-longitude/search/?n=&c=231&cd=&per_page=1

With a bit of url manipulation the highest country id is 249 so that would be your outer loop range.

alo · November 14, 2022, 2:25pm

Thank you ArjenEX
Marcin

alo · November 18, 2022, 9:40am

Hi,
I have encountered another problem.
I have created table containing all country numbers.
During functioning Loop node I encounter problem “Execute failed: Input table’s structure differs from reference (first iteration) table: Column 6 [body (Binary object)] vs. [body (String)]”.

Container GPS.knwf (64.8 KB)

Could you please give me any hint what to do with this problem ?
Marcin

ArjenEX · November 19, 2022, 12:16am

Hi @alo

In the vast majority of cases this can be solved with this option in the loop end. And otherwise some more manipulation within the loop to normalize whenever it exits the loop.

I had a look at your workflow and it’s running quite inefficient. Mainly because your grouping and use of loops is a bit off which makes it extract the page each port 98 times while the most number of pages I found is 9.

Moreover, the GET request and the webpage retriever both do the same and are therefore not both required.

This is small clean-up version of the WF:
Container GPS cleanup.knwf (129.4 KB)

alo · November 22, 2022, 9:52am

Hi ArjenEX
I am really grateful for your help.
Marcin

system · February 20, 2023, 9:52am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.