Public data base download

PauloSch · December 11, 2022, 6:04pm

Hello folks!
This is my first post using my personal account. I’d want to retrieve and stack some half a million text files about 1 MB each from a public data set repository designed to serve one text file at a time.
The usual access is via a webpage in which one selects sequentially four dropdown options for state, municipality, zone and section. These four keys are hierarchical, and each dropdown options depends on previous selection. State and municipality are categorical, but zone starts from one to the total zones of that municipality, the same occurring with sections, so there is no way to access directly each section from first to last. There would be a lot of repeated numbers.
I presume this could be done with webpage interacting nodes, but I didn’t understand how to use HTTP POST and GET.
Are those nodes capable of doing what I need? If not, is there any other that does?
FYI this is the site: Resultados – TSE

ArjenEX · December 11, 2022, 7:46pm

Hi @PauloSch

Welcome to the KNIME Community! As you might have seen, the state, municipality, zone and section are part of the URL parameters.

As you probably have encountered, this is complicated to automate. Technically you can re-create this with KNIME with a series of nested loops, but this can get out of control pretty quickly having to cater for all possible combinations of those four categories. The GET request and the Webpage retriever node end up hitting a .js that is executed.

The most chance you’ll have with Selenium Nodes for KNIME. They are made for web automation and tasks like this:

PauloSch · December 11, 2022, 8:30pm

Thank you, @ArjenEX. I’ll download this extension and take a look at it. Hope it suits, otherwise I’ll have to ask some help from a Python guy.

system · March 11, 2023, 8:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.