Call Workflow (Table Based) failed

Hi,

We are Using Parallel Chunk Loop(use automatically chunk count) to call workflows by using “Call workflow (table based)” .
Table Creator(6 Rows) -> Parallel Chunk Start-> Call workflow (table based) -> Parallel Chunk End
※ CPU has a 4-core processor, 8 logical processors by the result of msinfo32 command

This workflow worked well for months, but today it crashes with an error:
ERROR Call Workflow (Table Based) 0:18: Execute failed: Read timed out

How to solve the problem? Do I need to increase time-out value?
Thanks in advance.
Ryu

Hi,
Are you calling the workflow remotely from a local workflow? Could it have to do with this issue?

Kind regards
Alexander

Hi @AlexanderFillbrunn

Thank you very much for your quick response.
Yes, We are calling a local workflow,
But the Java version is 1.8.0_152 , and OS is Windows Server 2016 in our environment,
It seems the error has nothing to do with the issue.
The workflow worked well in this morning, but failed in the afternoon.

BTW:
KNIME Sever : 4.82
KNIME ANALYTICS PLATFORM: 3.72

Ryu

Hi Ryu,
Thank you for that information. Would you mind also sharing your KNIME log? Do you use the KNIME AP locally and as the server’s executor? I am wondering if the actual invokation of the workflow fails or if the workflow itself fails because some node throws the error. Can you trigger the workflow you are trying to call manually, e.g. from the AP and find out which node it is that gives you problems?
Kind regards
Alexander

Hi @AlexanderFillbrunn

We use as the server’s executor.
And the error is not easy to reproducible because it worked well for months and also worked well in yesterday evening and today morning.

I am doubting whether some parallel processed workflow made system resources heavily utilized, so the calling another workflow is time out.

BTW: Node ID of the “Call Workflow(Table Based)” node is 18 which printed in the error message.

Best regards.
Ryu

The things that come to mind are:

  • how stable is your remote connection and is the machine running the KNIME server a virtual one or hardware
  • does the server use SSD or HD or some shared storage via another network
  • is an agressive virus scanner running?
  • are there any (third) database connections involved

And indeed a full log would maybe give us more information

1 Like

Hi,
I got the log but unfortunately it does not give away more than the error message. You raise good questions. If the workflows are mounted on a network drive, this could cause performance issues. @laughsmile could you give us some more info on the setup?
Kind regards
Alexander

Hi @mlauber71 @AlexanderFillbrunn

The called workflow is located in the local server , so I think the connection is stable.
and the Server is a physical server(AWS Dedicated Instances) with good spec, which uses SSD storage.

In the server, there is a antivirus software ( ServerProtect made by Trend micro) ,
When the knime workflow was executed , the antivirus software was also running.

In the called workflow, it access redshift by using “Amazon Redshift Connector” Node.

The full log is very large , so I sent the logs from 3:00 pm to the error happened time.
Which part of the logs or other things is needed more?

BTW:
there are six workflow instances(same workflow) called in parallel , the five of them were executed successfully, just one failed.

Best regards.
Ryu

Hi @mlauber71 @AlexanderFillbrunn

Would you please tell me how to increase time-out value of calling workflow?
Thanks in advance.
Ryu

Hi,
I don’t think this is possible, as it is not expected to take long to run a local workflow. I will talk to some colleagues and try to find out what’s up.
Kind regards
Alexander

I think you would have to take a step back and think about how to construct your workflow and at which point to call parallel workflows. Maybe you could provide us with an example or a screenshot about what the workflow does.

If the thing continues to be unstable you might have to think about ‘communicating’ with the workflow(s) and make them to report back when they finished successfully and maybe have a Variable Condition Loop End at the end, and maybe also some Error catcher.

Hi @mlauber71 @AlexanderFillbrunn

I checked all the log files, and found nothing about detailed stack trace.

So I have to reduce the number of Parallel processes from 6 down to 3 like the following:

FROM:

Table Creator(6 Rows) → Parallel Chunk Start(automatically chunk count)-> Call workflow (table based) → Parallel Chunk End

CHANGE TO:

Table Creator(6 Rows) → Parallel Chunk Start(set custom chunk count as 3)-> Chunk Loop Start → Call workflow (table based) → -> Chunk Loop End ->Parallel Chunk End

Best regards.
Ryu

Hi,
You can also try to give the KNIME Server instance more memory. This might help if you call the workflow using the knime:// protocol. Please see here how to do that.
Kind regards
Alexander

1 Like

Hi @AlexanderFillbrunn

Thank you so much.
I have changed the memory setting in the knime.ini :
-Xmx8000m

Best regards.
Ryu

Hi,
And does that solve the problem? If you have the opportunity, you could also try out 4 GB in order to let the server not consume too many resources.
Kind regards
Alexander

Hi @AlexanderFillbrunn

The server has 16G memory, so I changed the memory setting to 8G.
But today the error appeared again.(the third time this month)
It seems the only way is to modify workflow as I mentioned above.

Best regards
Ryu

Hi,
One more question: how did you configure the node? It has options for short and long running workflows. Did you specify that the workflow runs long? See here for more info: https://kni.me/n/M66aoaj-gqFbGR5j
Kind regards
Alexander

2 Likes

Hi @AlexanderFillbrunn

I specified the workflow to run as long duration in the Invocation setting of Call Workflow (Table Based) node from the start.

Best regards.
Ryu