Scraping subcategories from main categories

V-P · December 6, 2019, 8:23pm

Hi community,

I’m a newbie currently trying to scrape website (https://dusseldorf.shopdutyfree.com/en/) subcategories from its top categories, trying to do it dynamically as some categories don’t have any subcategories and also the number of existing subcategories varies per category.

The workflow is almost done, but I don’t get correct results, which I think has to do with the wrong loop node. For instance, I get subcategories skincare, make-up and fragrance listed across all categories (new, beauty, special offers etc), although they actually belong only to 1 category (beauty).

Does someone have an idea as to which loop node I should use or if there is another problem with the workflow? Thanks a lot.

armingrudd · December 7, 2019, 7:25am

Hi @V-P and welcome to the KNIME Forum,

You do not need loops at all.

Here is the workflow to extract the sub-categories for each category using the new Webpage Retriever node (you can also use the HTTP Retriever and HTML Parser nodes instead but in that case you need to add the namespace):

webpage_retriever.knwf (486.1 KB)

V-P · December 8, 2019, 11:04am

Hi @armingrudd,

thank you very much! That is very helpful, works just as I wanted. Amazing how much shorter and more effective this new workflow ist.

I have only 1 question left: could you please explain (maybe show in a screenshot) where you found the xpath? I think I found the first one for all top-categories

, at least I got the same results, but can’t find the second one. That would help a lot to build similar workflows.

armingrudd · December 8, 2019, 11:30am

https://blog.statinfer.com/how-to-get-the-content-of-a-web-page-in-knime/

system · December 15, 2019, 11:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.