How to apply filter of data before web scrapping?

mmrostami · February 26, 2025, 7:40am

I came to this through searching posts about web scrapping:

I noticed if you check the url, there is filter option for the data on that page, while Webpage Retriever node only see the default url.
I inspect the filter field on the page, it’s something like this:

<input type="text" name="dData1" id="dData1" placeholder="mm/dd/yyyy" class="datepicker hasDatepicker" onkeypress="javascript:mask_data('2','0',true,frmBD);" value="02/25/2025" maxlength="10" spellcheck="false" data-ms-editor="true">

How can i tell the Webpage Retriever node to apply this field with different value on page first then after that retrieve it?

badger101 · February 26, 2025, 7:52am

Hi @mmrostami , the URL you provided in the workflow has a .php in it, meaning it is a dynamic page that executes scripts from a web server.

KNIME’s Webpage Retriever can only retrieve data from simple static websites. Read more about the difference between static and dynamic here.

What you can try for the time being is to experiment with KNIME’s alternative nodes from the Web Interaction Extension:

It has some drawbacks, but you’ll have to try - see if it can help you navigate the webpage without issues.

Alternatively, you may also try the Palladian Nodes:

mmrostami · February 26, 2025, 8:05am

wow thank you, going to read about dynamic pages and alternative nodes.

mmrostami · February 26, 2025, 5:56pm

I found this as solution and want to learn more about it. thank you @badger101

but it’s free for one month.

MartinDDDD · February 26, 2025, 6:16pm

KNIME Web Interaction does similar stuff to Selenium nodes… it also opens up a browser that is then controlled by the logic you build with nodes in KNIME…

Whenever you want to fill out forms and click buttons on websites it really works well!

mmrostami · February 26, 2025, 7:19pm

@MartinDDDD I’m really looking to learn this nodes, where can I find good reference?

MartinDDDD · February 26, 2025, 7:39pm

I did a video on one of the just knime it challenges a while back that uses this extension - apologies in advance for AI voice

Alex_JW · February 28, 2025, 7:03am

Hey @mmrostami, glad that you got a solution from @MartinDDDD and thank you very much for providing such a good one! I hope you could fix your workflow accordingly. If you have feedback / suggestions for the Web Interaction Nodes, please do not hesitate to leave a message here.

Best,
Alex

mmrostami · February 28, 2025, 7:22am

Thank you, @Alex_JW. extensive research on web scraping using BeautifulSoup and other tools, I discovered that this KNIME node is highly effective and for solving the problem. However, I am still encountering an issue at one step in my workflow. I will create a new topic soon to seek. I appreciate the support from wonderful community and express my gratitude to KNIME and everyone involved. Thank you.

system · March 7, 2025, 7:23am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.