Retry-On-Fail Loop Feature Request

Edlueze · April 5, 2016, 4:25am

I'd like to formerly request KNIME consider the addition of a Retry-On-Fail Loop. That is, if a node fails to complete execution as expected because of some (usually external) factor then try again until the node succeeds.

It's a feature request that I've already discussed several times:

https://tech.knime.org/forum/palladian/handling-httpretriever-concurrentmodificationexception-error

https://tech.knime.org/forum/knime-users/how-to-handle-the-15-minutes-disconnect-from-twitter-with-knime-twitter-workflow

I've identified four use cases:

handling a 15 minute disconnect from a data feed (like Twitter)
writing to an overworked database
handling KNIME nodes that intermittently fail
reliably connecting to a database

Use Cases 1, 2, and 3 are all discussed in the links provided. A good example of Use Case #3 was the Vernalis Wait-for-time node that occassionally failed if it missed the precise timeslice it was waiting for - but that makes the node unuseable if you are trying to build a highly-reliable workflow.

In this thread I'm going to discuss Use Case #4 - reliably connecting to a database. While that may appear similar to Use Case #2 (wriing to an overworked database) the same solution does not work. The example illustrate the limitations in using existing KNIME nodes to achieve reliable execution.

Attached is a complex loop designed to retry if there is a failure. This workflow has been pretty useful until I tried to use it to ensure a connection to a database. My first attempt resulted in an error "Can't merge FlowVariable Stacks! (likely a loop problem)" at the downstream Database Writer node. It appears that Flow Variables also propogate along Database Connections which are not caught by the Loop End node.

I thought I could avoid this problem by using a wrapped MetaNode to trap all Flow Variables inside (see attachment). Unfortunately that results in a second error "Loop Body wired incorrectly (Branches are not permitted to leave loops!)" caused by the outgoing Database Connection.

I cannot intercept a Database Connection with a Loop End, and I cannot remove the Flow Variables attached to the Database Connection . So I cannot create a reliable Database Connection.

Reliability is my greatest challenge with KNIME right now. A Retry-On-Fail Loop is just one suggestion that I hope will make KNIME more reliable. No doubt there are many other aspects that need to be considered.

retry-on-fail_database_connection_001.png

Edlueze · April 11, 2016, 1:58pm

I'm adding another Use Case to the Retry-On-Fail Loop Feature Request:

#5. retry loading a web page

The availability of a website cannot be guaranteed so it is necessary to be able to continue to retry until the URL is loaded. I've also sent the great folks over at Selenium a similar feature request and posted a picture of a not-so-good KNIME wrapper around the "Start WebDriver" node that attempted to make page loading more reliable:

https://tech.knime.org/forum/palladian-selenium/selenium-feature-request-start-webdriver-retries-after-error

I had been going into the office four times over the weekend (Saturday morning, Saturday evening, Sunday morning, and Sunday evening) to keep my KNIME workflow going. But I'm now experimenting with a hopefully better technique. It uses an AutoHotkey's Script to continually simulate the hitting the SHIFT-F7 button within KNIME (run!).

For those who are interested, this is the Script I've come up with:

#k::                                             ;Windows-k (k for KNIME) will run this AutoHotKey (AHK) Script
 
                                                ;Tell the user that this script is running
MsgBox AutoHotKey KNIME Automatic Retry Running. To stop the Script right-mouse-click the AHK icon in the System Tray and select "Pause Script". Exit AHK to stop the Script running altogether.
 
Loop                                            ;This will loop forever - pause the script in the System Tray
{
    WinActivate , KNIME Analytics Platform        ;Switch to the window titled "KNIME Analytics Platform"
    Send, +{F7}                                    ;Send the SHIFT-F7 keys to set the Workflow running (again)
    Sleep, 10000                                ;Sleep for 10 seconds then do it again
}
 
Return                                            ;Finish this AutoHotKey Script

ferry.abt · April 24, 2016, 7:13pm

Hello Edlueze,

Thank you for your suggestion. I have opened a feature request for you and will post updates in this thread.

Best,
Ferry

Vernalis · April 25, 2016, 10:21am

Edlueze,

This is an intersting and very useful idea. Also, could you provide any more detail about the Vernalis node failing, as it's not something we've ever seen it do?

Thanks

Steve

mwiegand · January 16, 2019, 9:03am

Hi,

did anyone started working on this really good idea? The use case I face quite often, which relates to #1 about disconnects from Twitter, is that Google Spreadsheet Nodes facing timeouts.

Whilst I think something can be constructed with a recursive loop start and if variable nodes, seeing what was suggested as one node would be awesome.

Thanks a lot
Mike

ipazin · January 16, 2019, 2:36pm

Hi!

Maybe this topic can help regarding a workaround:

Br,
Ivan