Was wondering if someone could help. I am trying to scrape an angularJS website. When using the HTTPRetriever it only scrapes the HTML without the variables names written in/still has only placeholders. Was wondering if anyone knows how to work around this. Is there a way to allow for the page to fully load? A special trick to retrieving xpaths for Angular websites?
Preferably without having to use a selenium node but that might be overly picky.
Thanks in advance!
Hi @mpfeifer14 -
Welcome to the forum! Since both the Palladian nodes (of which HTTPRetriever is a part) and the Selenium nodes are maintained by @qqilihq, I’ll tag him here and see if he has a good suggestion for you.
Thanks for the pointer @ScottF!
With the Selenium Nodes you have the mentioned JS environment and the dynamic content will be rendered, you can extract it using XPath/CSS, and you can interact as you would as a human being.
As a disclaimer: As you have probably noticed already, the Selenium Nodes are a paid product (in contrast to Palladian which we – i.e. me and colleagues, independently from KNIME – provide for free for the regular KNIME platform, despite a considerable maintenance effort of both and the extensive support which we provide for free). And the paid licenses are the way to fund these efforts.
You can try whether the Selenium Nodes work for your use case with our free 1 month trial licenses, and I can assist in the case of any specific questions (best placed in the Palladian/Selenium sub forum).
This worked well! The Selenium nodes operate a little differently because they approach things from the front end in. Very versatile way to do because of this it is so simplistic in how it operates. This solved all of my issues crawling an AngluarJS website in Knime.
Thanks for the help!
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.