Xpath problem

acommons · September 16, 2018, 9:52am

I’m trying to use the Xpath node (which I have used before with no problem) to parse some HTML content (again, which I have done before). So when I’m using the point and click interface I am seeing the following behavior:

I then select the next div tag and i get:

I cannot select any valid tags from that point on. Presumably something in the HTML document is upsetting the parser. Any idea what this might be?

I cannot change the source HTML so any ideas how to work around it?

armingrudd · September 16, 2018, 12:14pm

Hi,

I’m not sure if this will solve your problem but here is my suggestion:
Open the document in Firefox and from the menu select Web developer -> Inspector (Ctrl+Shift+c). Then find the tag you want and right click then select Copy and then select XPath. Paste the path in the XPath node and add the “dns:” to the path (e.g. /dns:html/dns:body/dns:div/dns:div/dns:p). Then configure the other settings and press OK.

The reason for suggesting Firefox is that sometimes I get wrong xpath from Chrome but Firefox does well.

Best,
Armin

acommons · September 16, 2018, 12:54pm

I’ll give that a try. What I want is way past the point I’ve highlighted…it’s just that is where it seems to go off the rails, no tags work after that point.

acommons · September 17, 2018, 10:20am

Using the path generated by Firefox gets me past the blockage and I have extracted all the table rows I want. Hopefully everything will be well behaved as I start to unpack the information in the rows.

Thank you once again!

system · September 24, 2018, 10:20am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.