I have a problem and I don’t understand why. When moving from page 15 to page 16, the browser closes on its own without any intervention. I cannot get data from other pages in progress.
Why do you think the browser is closing by itself?
I will hop in and respond with some ideas as the user you linked hasn’t yet.
It is hard to tell why that may be happening without more details of whats going on inside the component. If it is a copy paste of your previously ran page miner components, it may be due to the actual webpage and not component.
You may need to add a wait to allow enough time for a webpage to populate which can possibly error out any logic you have in your component.
does this issue happen reproducibly always with the same URL? Are you running the latest Selenium Nodes version and / or have you restarted your computer, just to rule out the low hanging fruits? Which browser are you using?
Hello; @thor_landstrom and @qqilihq Thank you both for the replies. Philipp, I sent you the workflow I was having problems with via e-mail. Operating system ubuntu and knime 5.2.3
thanks for sending over the workflow! I had a look (impressive!), and while I can not entirely guarantee, I think the reason for the problem is, that you generate an extreme amount of temporary data during workflow execution and I suspect that this eventually causes the browser to crash.
Here’s some suggestions which I suggest to implement, and which will for sore be helpful for other Selenium Nodes users as well. They will make the workflow execute with less resources and thus snappier - and thus hopefully also solve the issue with the crashing browser.
Avoid the combinations of Find Elements + Execute JavaScript (as you use the JS only for extracting text content). If you need to extract strings, I recommend to use Extract Text node instead (do not use a “Find Elements” node unless you have to; instead enter the XPath or CSS directly in the Extract Text node using the “Find Elements” option. This makes things much faster, as you avoid writing lots of intermediate information.
If you still need to use Find Elements, avoid ticking the setting “Append additional WebElement information”. It is slows down things considerably and I only recommend it for debugging reasons in general and not for information extraction tasks.
You can replace most of the Find Elements + Execute JavaScript combinations with a single Table Extractor node. It will automatically extract most of the table information for you, and you’ll only need to do some simple string post-processing.
It’s not necessary to have the combination of GET Request+ HTTP Retriever. Take one or the other - I much recommend HTTP Retriever, as it works best with the HTML Parser node.
Instead of the explicit Wait node, make use of more “smart” waiting options. The “Find Elements” settings have a “Wait up to …” option, which I suggest to use instead. It will wait and continue execution as soon as elements for the entered XPath/CSS query become available on the page.