Scraping subcategories from main categories

Hi community,

I’m a newbie currently trying to scrape website (https://dusseldorf.shopdutyfree.com/en/) subcategories from its top categories, trying to do it dynamically as some categories don’t have any subcategories and also the number of existing subcategories varies per category.

The workflow is almost done, but I don’t get correct results, which I think has to do with the wrong loop node. For instance, I get subcategories skincare, make-up and fragrance listed across all categories (new, beauty, special offers etc), although they actually belong only to 1 category (beauty).

image

Does someone have an idea as to which loop node I should use or if there is another problem with the workflow? Thanks a lot.

Hi @V-P and welcome to the KNIME Forum,

You do not need loops at all.

Here is the workflow to extract the sub-categories for each category using the new Webpage Retriever node (you can also use the HTTP Retriever and HTML Parser nodes instead but in that case you need to add the namespace):

webpage_retriever.knwf (486.1 KB)

:blush:

3 Likes

Hi @armingrudd,

thank you very much! That is very helpful, works just as I wanted. Amazing how much shorter and more effective this new workflow ist.:slight_smile:

I have only 1 question left: could you please explain (maybe show in a screenshot) where you found the xpath? I think I found the first one for all top-categories

, at least I got the same results, but can’t find the second one. That would help a lot to build similar workflows.

1 Like

https://blog.statinfer.com/how-to-get-the-content-of-a-web-page-in-knime/

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.