XPath unable to define path

Hi there,
I have a problem retrieving data that I load with Webpage Retriever node. Data has XML header but XPath does not detect data paths. As the body contains curly brackets I tried with JSON Path but also only the entire body can be handled as a path producing a list.
I tried at different stages to remove the brackets with String Manipulation node but that threw errors, so that is possibly a different story.


How can I process this data either as XML or Json?

Many thanks in advance,
Thomas

XML_reading.knwf (79.6 KB)

Hi there @ThG2020 ,

for the URL you intend to work with, it can be broken down via this initial pathway:

xml to json

This will return a List column, which you can then expand and process further.

If you have trouble still, let me know which datapoints you want to extract, and I’ll do it for you. Cheers.

1 Like

Hi Badger101,
many thanks for your help! I gave it a try and tested different settings for the JSON to Table node. However, data is somehow split between columns (apparently split by the node due to some character interpreted as a separator) and further processing is still blocked or complicated by curly brackets.

Is there no chance to use a XPath or JSONPath straigthforward to extract a collection of paired labels and values? The body format somehow does not allow the Path nodes to identify single objects, I can always only select the entire body as string.

Brgds
Thomas

If your input has no valid structure, the nodes you mentioned won’t work on them.

I have tried something, but can you confirm that there are 133 indicators altogether?

The first few datapoints :

The last few datapoints:

Altogether, 133 indicators?

That looks already promising and I can confirm 133 indicators.

Alrighty. Here’s my approach on this:

My workflow ends with a manual inspection metanode, where the rows are semi-cleaned.

Upon inspection, you will see possible issues that you might want to deal with. For example, it may reveal that you would still need to perform a complete regex cleanup, deal with “null” content and other things:

I shall leave that part to you as you see fit.

Here’s the workflow :arrow_heading_down:
Webscraping Nested Curly Brackets by badger101.knwf (36.2 KB)

3 Likes

Great! Output is exactly what I was looking for, with some housekeeping nodes I now have the desired data format. Amazing, what massive workflow was needed for this.
Thanks again,
brgds
Thomas

1 Like

You’re welcome :star_struck: Glad to see it’s been solved.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.