Executor hangs due to message size

pzkor · April 21, 2022, 7:18pm

My server is running a workflow which calls another workflow using the Call Workflow (Table based) node. I am sending rather large chunks of data and my mistake was to add another column to the data earlier on in the process and not remove it before sending the table to the Call Workflow node. As a consequence the job crashes due to a ‘Message size of 105507027 greater than allowed maximum of 104857600’.

I think I know what is the likely cause and can now just go and remove any unnecessary columns from the table sent to the other workflow. However, such an incident seems to make the Executor to freeze and none of my scheduled jobs can be started. And as I am not the system Admin, the only thing I can do is to delete all jobs I can see hanging in the memory. This however doesn’t fix the issue and it looks I will be forced to ask our IT to restart the services once they are available.

Is this a known issue, and are there any way a content admin could manage it independently?

NDekay · April 21, 2022, 8:48pm

Hello,

Thank you for contacting KNIME regarding this issue.

My understanding of the issue is that a workflow called via Call Workflow (Table based) node with large data sent in, results in an error message ‘Message size of 105507027 greater than allowed maximum of 104857600’.

Please provide the following additional information for troubleshooting:

What operating system is your KNIME server software running on?
What version of KNIME Server (KS)/KNIME Executor (KE) are being used?
What version of KNIME Analytics Platform (AP) is being used?
Does the issue affect one user, multiple users, or all users?

I look forward to helping you resolve this issue!

Regards,
Nickolaus

pzkor · April 22, 2022, 1:41pm

KNIME Server is running on Windows. I don’t know how to check the server version because I am not an Admin. Both Executor and AP are 4.5.1. Other users haven’t tried to run the workflow.

I am now able to run the workflow after removing unnecessary columns from the table sent to the other workflow. But this has happened before with other server, Executor and AP versions for the same reason (my mistake was to have forced the excluded rather than kept columns in a Column Filter just before the Call Workflow, meaning that any new column would be sent as well).

My actual problem about this is the fact that the Executor freezes/crashes and won’t start any other jobs even if I delete all the jobs. The only thing I can do is to call our IT and ask them to restart the Executor, which wastes everyone’s time.

NDekay · April 25, 2022, 10:01pm

Hello @pzkor,

If you are on AP 4.5.1/KE 4.5.1, then the KS version is probably 4.14.1.

There is an issue that originally came up in defect SRV-3352 in KS 4.11.5 when Call Workflow Node is used.

Overflowing the max message size used to close the communications channel for the specific user, requiring a server restart to resolve it.
This was fixed in KS 4.11.6/KS 4.12.2, adding in configuration options as we discussed to allow the max message size to be reconfigured if needed to allow larger messages.

However, there is a sub-request designated AP-16381, requesting adjustment to the Call workflow nodes to allow multi-part messages, which would then allow the messages to be split up into smaller chunks and not require increasing the message size limit as much. This request has not yet been completed, and I don’t know where it stands on the roadmap for future completion.

So the configuration of max message size is the current workaround. To do this, Two files need to be changed.

Server - catalina.properties - Located under the apached-tomcat_9.0.XX/conf directory
Append these two lines to the end of the file:
qpid.max_message_size=204800000
com.knime.enterprise.max-message-size=204800000

Executor - knime.ini - Located in the executor directory, with the knime.exe.
Append this line to the bottom of the file:
-Dcom.knime.enterprise.max-message-size=204800000

This will allow message queue size of ~200mb, and the server and executor will allow messages of that size. You can configure the max-message-size to be larger than 200mb, if needed. I believe the absolute limit is 512mb, but I would probably try 128mb for your use-case, and see if anything tries to go larger than that. If there are messages that do manage to exceed whatever the configured max-message-size is, an error will be thrown, similar to what you already saw originally.

Performance may be impacted for the larger messages, but should not negatively impact anything else with smaller message sizes, meaning your average workflow use-cases should not be affected much, if at all. It’s just the ones sending larger messages that you will see some impact.

Please let me know if this fails to resolve your issue.

Regards,
Nickolaus

pzkor · May 16, 2022, 3:53am

Thank you very much for your response @NDekay.

In our case the reason why the problem was triggered was a column filter node just before calling the external workflow which let through any new variables that had been created in the workflow. As these variables were unnecessary for the call in the first place, I modified the column filter to keep a fixed set of columns and we are good again. I will implement the max message size tweak you suggested if the problem reappears.

system · August 14, 2022, 3:54am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.