Import table form www

Hi,
I have KNIME 4.6.3 and I’m not able to install Selenium node as it’s describde on Selenium Nodes — Download.
I need to import tables form GCC Ports.
I did it in Power BI but I need to do the same in Knime.

My present result is not efficient and elegant. Could you give me any suggestions ?

Container GPS.knwf (35.2 KB)

Marcin

Hi,

Why not?

–Philipp

It sounds more like a data issue and perhaps you want to solve it with Selenium?

Getting the actual data is possible with just one XPath node, so technically you don’t need Selenium. Trick here is finding the right path.

In this case, that’s /html/body/div/section/div/div[2]/div/div/div/div[3]/table/tbody/tr[*]/td[1]

whereby the last td[] designates whether it’s the name, code, etc. Set the output to String(Multiple Rows)

Only thing that you have to account for next is the pagination.

Hi,
I receive en error.
/html/body/div/section/div/div[2]/div/div/div/div[3]/table/tbody/tr[*]/td[1]
image

Marcin

Hi,
You are great ArjenEX.
Thank you very much for your help.
Best regards,
Marcin

Hi,
I can’t close this topic.
On the site www.gccports.com there are doubled information which I removed.
There is another problem. I am not able to receive all the information which I received in Power BI
image
image
image
image

Container GPS.knwf (30.0 KB)

Could you please give some suggestions ?

Marcin

The main overview page is selective in what it shows and thus not complete. Hence you’re missing records.

I would suggest starting an outer loop based on the search parameters that you can pass along in the url and then a dynamic inner loop to go through all the pages.

These are the parameters from the search whereby c is the country id and per_page is the page number. For example:

https://www.gccports.com/ports/latitude-longitude/search/?n=&c=231&cd=&per_page=1

With a bit of url manipulation the highest country id is 249 so that would be your outer loop range.

Thank you ArjenEX
Marcin

Hi,
I have encountered another problem.
I have created table containing all country numbers.
During functioning Loop node I encounter problem “Execute failed: Input table’s structure differs from reference (first iteration) table: Column 6 [body (Binary object)] vs. [body (String)]”.

Container GPS.knwf (64.8 KB)

Could you please give me any hint what to do with this problem ?
Marcin

Hi @alo

In the vast majority of cases this can be solved with this option in the loop end. And otherwise some more manipulation within the loop to normalize whenever it exits the loop.

image

I had a look at your workflow and it’s running quite inefficient. Mainly because your grouping and use of loops is a bit off which makes it extract the page each port 98 times while the most number of pages I found is 9.

Moreover, the GET request and the webpage retriever both do the same and are therefore not both required.

This is small clean-up version of the WF:
Container GPS cleanup.knwf (129.4 KB)

Hi ArjenEX
I am really grateful for your help.
Marcin