fail to use xpath to scrape

I want to scrape news title and time from https://botanwang.com/top_right_news

My xpath for titlie is : //div[@id=“block-system-main”]/div/div/div/div/ul/li/span[1]/span/a

I test it for many times and i thought it is correct, but I scrape nothing.

my workflow is simple: table creator–webpage retriever–xpath

Hi @yxlyxl8,
I think the problem is that the document declares a default namespace at the top: xmlns="http://www.w3.org/1999/xhtml". In the XPath node you see that in the second tab:

To fix your query, you need to prefix each tag with dns:. I would extract the titles with the XPath query:

//dns:div[@class='content']//dns:div[@class='item-list']/dns:ul/dns:li//dns:a/text()

and you can get the times in a similar fashion. Instead of explicitly descending into every element along the path (/div/div/div/div/ul/li/span[1]/span/) it is usually more robust to use the // operator, which searches in all descendants, and couple it with a stable filter by element attributes, like I did above with the @class attribute.
Kind regards,
Alexander

3 Likes

Dear Alexander,

Thanks for your help!!! I have successfully scrape the title and time! :smiling_face_with_three_hearts:

Thanks again,
lxy

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.