I just realized, that I never went on with this idea. So now that I have the luxury to spend more time with KNIME and workflows due to a job change, I wanted to post my result here.
You can find the workflow, based on @qqilihq 's great work here on the Hub:
Some things I have included:
- (kind of) a user interface: Just right click the Select Competitor component and open its views to have a nice drop down of companies to “scrape”. I further on use this information for the LinkedIn Job Search URL and for the file name (I export to Excel)
- I have used the following JS code to overcome the “infinite scroll” of the LinkedIn Job Results page. I embedded it into a “hard” loop (10x) - this works really well for me.
Here’s the code I used:
window.scrollTo(0, document.body.scrollHeight)
- The 2nd to last node (Row Filter Node labeled: filter on selected Competitor) is just for the case, when you have a very common company name. In my internal use case I had the situation that there are 2 companies with the same name, but I was looking just for 1 of them. So this is a hard filter, you either have to adjust or simply delete that node.
I once again learned a lot, especially about extracting data with XPath which - I assume - is nice to know when you’re scraping the web
Let me know, what you think