[BUG?] Try & Catch Error

linkm · September 19, 2019, 11:27am

Hi KNIME team,

unfortunately, I experience a bug in KNIME 4.0.1:

I call a workflow that identifies the newest version of other workflows in my workspace based on my naming scheme (by identifier & date in the worfklow name, e.g. “FI 20190919”). I call this workflow via 10 other workflows - therefore, some workflows have to wait for a while as the identification of the newest workflow lasts around 10 seconds (the workspace is scanned for file names).
If the workflow call fails, a retry should be initiated one hour later as to be seen in the workflow 01:

01%20Workflow%20Loop

The fail of the workflow call should be identified by a try - catch node combination as to be seen in screenshot 02 (shows interior of the metanode “WF EXECUTION V4”):

However, if the workflow call fails, the whole workflow fails as the other nodes change to idle, instead:

It may be restarted manually - the error catcher will present this error message:

java.lang.Exception: Failure, workflow was not executed, current state is IDLE.
ROOT : EXECUTING (start)
ROOT (end)

at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.executeInternal(CallWorkflowTableNodeModel.java:132)*
at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.access$0(CallWorkflowTableNodeModel.java:112)*
at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel$1.call(CallWorkflowTableNodeModel.java:99)*
at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel$1.call(CallWorkflowTableNodeModel.java:1)*
at org.knime.core.util.ThreadPool.runInvisible(ThreadPool.java:615)*
at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.execute(CallWorkflowTableNodeModel.java:96)*
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:567)*
at org.knime.core.node.Node.invokeFullyNodeModelExecute(Node.java:1192)*
at org.knime.core.node.Node.execute(Node.java:979)*
at org.knime.core.node.workflow.NativeNodeContainer.performExecuteNode(NativeNodeContainer.java:559)*
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:95)*
at org.knime.core.node.workflow.NodeExecutionJob.internalRun(NodeExecutionJob.java:179)*
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:110)*
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:328)*
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:204)*
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)*
at java.util.concurrent.FutureTask.run(FutureTask.java:266)*
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)*
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)*
Suppressed: java.lang.NullPointerException*

  at org.knime.core.node.workflow.WorkflowManager.disableNodeForExecution(WorkflowManager.java:2100)*

  at org.knime.core.node.workflow.WorkflowManager.disableNodeForExecution(WorkflowManager.java:2038)*

  at org.knime.core.node.workflow.WorkflowManager.cancelExecution(WorkflowManager.java:5051)*

  at org.knime.explorer.nodes.callworkflow.local.LocalWorkflowBackend.close(LocalWorkflowBackend.java:398)*

  at org.knime.productivity.callworkflow.table.CallWorkflowTableNodeModel.executeInternal(CallWorkflowTableNodeModel.java:140)*

```
  ... 18 more*
```

Do you have any idea on how to prevent the following nodes go to idle on an error? Is it a bug in KNIME 4.0.1?

Thank you for your help in advance!

Iris · October 15, 2019, 6:57pm

Hi @linkm

let’s get started with some question, to understand the problem better.

Are you scheduling this on a KNIME Server or are you running this manually?
What is the reason the other workflow is failing? Are there any errors?

Best, Iris

Iris · October 15, 2019, 7:03pm

This workflow might also be helpful for you

linkm · October 17, 2019, 8:24am

Hi Iris,
thank you for your hints. Let me answer your questions, first:

I start the workflow (attached above) manually on the normal Analytics Platform (currently: version 4.0.2).
There may be three kinds of errors that may cause a workflow to fail:
2.1 The Sub-Workflow is called similtanously by too many other workflows (my interpretation). Maybe the scheduling mechanism is failing?
2.2 I assume (I am not able to tell due to missing logs) that an Excel export node may cause an internal error due to its size (e.g. >400MB xlsx Export) running in another workflow. The Excel Export is actually configured with another try-catch-error-node setup that chunks of the data will be exported upon error in the Excel export node. This works actually fine.
2.3 I assume that the new DB Database nodes may cause errors in other sub-workflows as some nodes may have too wait for too long for their execution/configuration. My best guess is that this has no impact on the call of the workflow as these workflows are not failing - there are just “WARN” entries in the logs as they are waiting for their configuration.
2.4 As I have limited time to change my workflows to the new DB nodes, I also execute Database (legacy) nodes (with MS SQL Server) simultaneously - maybe that is something that is not really considered in the memory design.

From my understanding, I am actually performing the same method as provided by you in the worklow you provided. The only difference: I do not pre-define workflows to be executed - they are identified by the (failing) workflow that is called before the execution of the main-workflow.

Maybe it is of importance that the workflows are executed on a Xeon E5-2640 v4 with 10 Cores / 20 Threads and 32 GB RAM.

linkm · October 23, 2019, 8:22am

I just observed a “near crash” scenario that I just described and I think it is really due to the Excel Exporter nodes … is there any workaround for this?