Xpath Can't Create List When Using normalize-space

rfeigel · July 2, 2024, 1:40am

I was trying to help another KNIMER with some web scraping and ran into an Xpath problem. The original thread is How to pull data from a website that does not have an API? - #3 by rfeigel

This problem seems generic so I thought I’d start a new thread.
Here’s the workflow:
Simple Web Scraper.knwf (147.3 KB)
The xml is not very well formed. It consists of monthly data with an extra flag for the current year. There is a lot of extra white space in the value field. If I try to remove the white space the Xpath query won’t allow a list, only a single entry which gets stuck on the first data field. Here are a series of screenshots which hopefully will clarify the problem. I’ve done quite a bit of searching and haven’t found any help.

Blockquote

tomljh · July 2, 2024, 4:22am

Hi @rfeigel ,
The situation you mentioned also exists in 5.2.5. The error prompt is that the underlying Java package reports an error. Here, it is not known whether it is caused by incorrect invocation of the package or there is a problem with the underlying package.

I found a way with less resistance. The processing of the string is not handled in “xpath”, but is processed externally using the “String Clean” node.

Br

rfeigel · July 2, 2024, 2:21pm

@tomljh Thanks. Could you post your revised workflow please. The screenshot is very hard to read.

tomljh · July 2, 2024, 2:35pm

Sorry,
Simple Web Scraper 2.knwf (10.6 KB)

rfeigel · July 2, 2024, 2:44pm

Perfect. Thank you very much.

tone_n_tune · July 8, 2024, 1:10pm

Thanks everyone for the help. The issue arose from my query that @rfeigel was trying to resolve.

tomljh · July 8, 2024, 3:16pm

@rfeigel is an enthusiastic and perfection-pursuing person. Thank you for participating in the discussion. Under normal circumstances, I have never thought that an HTML page could be parsed as an XML document. I have also gained a lot.

system · July 15, 2024, 3:17pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.