There isn’t a real uniform answer here. It all depends on your usecase. How is the website structure, what kind of data do you want to extract from it, etc. etc.
Below is a snapshot workflow that I have in place that reads a directory of about 700 .html files which are similar in structure but have different data. Nodes like HTML Parser, Xpath and different kinds of string operations would be some go-to nodes for what you’re trying to achieve.