[BUG?] Try & Catch Error

Hi KNIME team,

unfortunately, I experience a bug in KNIME 4.0.1:

I call a workflow that identifies the newest version of other workflows in my workspace based on my naming scheme (by identifier & date in the worfklow name, e.g. “FI 20190919”). I call this workflow via 10 other workflows - therefore, some workflows have to wait for a while as the identification of the newest workflow lasts around 10 seconds (the workspace is scanned for file names).
If the workflow call fails, a retry should be initiated one hour later as to be seen in the workflow 01:

01%20Workflow%20Loop

The fail of the workflow call should be identified by a try - catch node combination as to be seen in screenshot 02 (shows interior of the metanode “WF EXECUTION V4”):

However, if the workflow call fails, the whole workflow fails as the other nodes change to idle, instead:

It may be restarted manually - the error catcher will present this error message:

java.lang.Exception: Failure, workflow was not executed, current state is IDLE.
ROOT : EXECUTING (start)
ROOT (end)

  • at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.executeInternal(CallWorkflowTableNodeModel.java:132)*
  • at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.access$0(CallWorkflowTableNodeModel.java:112)*
  • at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel$1.call(CallWorkflowTableNodeModel.java:99)*
  • at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel$1.call(CallWorkflowTableNodeModel.java:1)*
  • at org.knime.core.util.ThreadPool.runInvisible(ThreadPool.java:615)*
  • at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.execute(CallWorkflowTableNodeModel.java:96)*
  • at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)*
  • at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1192)*
  • at org.knime.core.node.Node.execute(Node.java:979)*
  • at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)*
  • at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)*
  • at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)*
  • at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)*
  • at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)*
  • at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)*
  • at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
  • at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
  • at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)*
  • at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)*
  • Suppressed: java.lang.NullPointerException*
  •   at org.knime.core.node.workflow.WorkflowManager.disableNodeForExecution(WorkflowManager.java:2100)*
    
  •   at org.knime.core.node.workflow.WorkflowManager.disableNodeForExecution(WorkflowManager.java:2038)*
    
  •   at org.knime.core.node.workflow.WorkflowManager.cancelExecution(WorkflowManager.java:5051)*
    
  •   at org.knime.explorer.nodes.callworkflow.local.LocalWorkflowBackend.close(LocalWorkflowBackend.java:398)*
    
  •   at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.executeInternal(CallWorkflowTableNodeModel.java:140)*
    
  •   ... 18 more*
    

Do you have any idea on how to prevent the following nodes go to idle on an error? Is it a bug in KNIME 4.0.1?

Thank you for your help in advance!

Hi @linkm

let’s get started with some question, to understand the problem better.

Are you scheduling this on a KNIME Server or are you running this manually?
What is the reason the other workflow is failing? Are there any errors?

Best, Iris

This workflow might also be helpful for you

Hi Iris,
thank you for your hints. Let me answer your questions, first:

  1. I start the workflow (attached above) manually on the normal Analytics Platform (currently: version 4.0.2).
  2. There may be three kinds of errors that may cause a workflow to fail:
    2.1 The Sub-Workflow is called similtanously by too many other workflows (my interpretation). Maybe the scheduling mechanism is failing?
    2.2 I assume (I am not able to tell due to missing logs) that an Excel export node may cause an internal error due to its size (e.g. >400MB xlsx Export) running in another workflow. The Excel Export is actually configured with another try-catch-error-node setup that chunks of the data will be exported upon error in the Excel export node. This works actually fine.
    2.3 I assume that the new DB Database nodes may cause errors in other sub-workflows as some nodes may have too wait for too long for their execution/configuration. My best guess is that this has no impact on the call of the workflow as these workflows are not failing - there are just “WARN” entries in the logs as they are waiting for their configuration.
    2.4 As I have limited time to change my workflows to the new DB nodes, I also execute Database (legacy) nodes (with MS SQL Server) simultaneously - maybe that is something that is not really considered in the memory design.

From my understanding, I am actually performing the same method as provided by you in the worklow you provided. The only difference: I do not pre-define workflows to be executed - they are identified by the (failing) workflow that is called before the execution of the main-workflow.

Maybe it is of importance that the workflows are executed on a Xeon E5-2640 v4 with 10 Cores / 20 Threads and 32 GB RAM.

I just observed a “near crash” scenario that I just described and I think it is really due to the Excel Exporter nodes … is there any workaround for this?