Problems with web retriever

VAGR_ISK · August 31, 2021, 8:22pm

Hi Guys,

I am trying to test the “Query Google For Address” link below, but it fails when executing. I realized that also other connections to retrieve info from pages of any webpage fail. The error is called “Execute Failed” and I know it is due to connection problems since I click the option in error handling “Output missing values” and it put a missing value instead of failing. Can someone bring some light to me?

G-Address

Thanks in advance.

bruno29a · August 31, 2021, 8:45pm

Hi @VAGR_ISK , I tried the workflow from the link, and it works without any problem for me, I’m not getting any error.

However, Google’s response has changed since that workflow was built, and so the XPath to retrieve the Address is no longer valid and hence returns an empty address. (Tagging @ScottF to let him know as it’s his workflow)

But the Webpage Retriever works without any issue.

bruno29a · August 31, 2021, 9:00pm

Hi @VAGR_ISK , it’s definitely a connection issue, and I don’t think it’s caused by Knime. Are you behind a firewall or something?

I was able to reproduce the error by running the node offline:
ERROR Webpage Retriever 0:243 Execute failed: www.google.com

(Note: My node number is different from the workflow because I made a copy of the nodes as I did not want lose the original data of the workflow).

Can you ping google.com from where this workflow is running? (Go on command line, and type the command ping google.com and Press ENTER)

You can share the IP that it tries to ping, in case your hosts file has been tampered with and your dns is being re-routed.

ScottF · August 31, 2021, 9:06pm

Good catch. Something has definitely changed. After a bit of trial and error, when I update the XPath query to

/html/body/div/div[4]/div/div[3]/div/div/div/div

it now produces the expected response.

VAGR_ISK · August 31, 2021, 9:50pm

Hi @bruno29a,

The connection did work after I adjusted my firewall.

Thanks.

VAGR_ISK · August 31, 2021, 9:56pm

Hi @ScottF,

Thanks for the response. I have to admit that I am not familiar with the /html/body… since I am a beginner in KNIME. What should I do, so that it works? In fact, my intention is to get the number of searches found by google scholar using the XPath after using the Webpage retriever. Now I am fighting with the XPath to get me the search number in a column table.

VAGR_ISK · August 31, 2021, 10:54pm

Hi @ScottF @bruno29a,

I meant that I want to retrieve the number inside of the XML file that I got from google (see pic). The number should always be in the same position. The process seems to require a lot of sintasxis as you mentioned @ScottF. So I guess you could tell me how to select this specific text.

bruno29a · August 31, 2021, 11:42pm

Hi @VAGR_ISK , the path has nothing to do with Knime really, but rather understanding XPath, which is not exclusive to Knime.

Looking at what @ScottF presented:
/html/body/div/div[4]/div/div[3]/div/div/div/div

This means go in the html path, and to body path that’s inside the html path, to the div path that 's inside the body path, etc…

You can think of them similar to sub folders within other sub folders, within a folder.

The div[3] would most probably mean the 3rd or 4th div (if [0] is the first one, then [3] would mean 4th one), as you can have multiple

I hope this clarifies what the path means.

system · September 7, 2021, 11:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.