Question about using Selenium with Python for web scraping in KNIME

Hello,

Unfortunately, I haven’t received any response regarding the limitations of using the Web Interaction extension in KNIME. Therefore, I have moved on to using Selenium with a Python script. While Selenium with Python works fine, I’m encountering some difficulties with setting up the web scraping process.

Specifically, I would like to know if it’s possible to chain multiple Python scripts together while reusing the same WebDriver instance.

Here’s my situation: when using a single Python script, I have to handle the entire web scraping process within that script. However, for optimal use and to test my code more regularly, I would prefer to break the script into several nodes. For example, one script to log in using Selenium, another to perform the search, and a third to process the data.

Unfortunately, I don’t think this is possible because the output of a Python script is a DataFrame, meaning I can’t reuse the WebDriver that was used to log in.

I’m not sure if this explanation is clear, but currently, I am forced to rerun the entire script from the start, including the login and search steps, for even the smallest change. This is not very practical.

Does anyone have an idea of how I can separate the different steps of the web scraping process without having to restart the entire script each time?

Thank you for your help!

That very fact that you are even considering this task with KNIME has my antenna up in a good way. I am very new and trying to wrap my head around the stack, but I am curious as to why your business requirement would leverage KNIME?

If I had to solve this problem, I would deploy the crawler outside of KNIME and expose the data via a connection that can be consumed within a workflow.

Having wrestled with Selenium and Playwright via python, I am curious to learn more myself as to how/why this can be achieved.

One thing you could try is save an intermediate result as a pickle file (or several) and then load the result in another python node.

In this example the file is actually written to the hard drive. You can also just connect the blue ports directly.

ChatGPT and Claude.ai are able to understand python code that sits within a knime python node.

1 Like