CAN NOT Load Workflow sometimes After upgrading KNIME Server 4.13

I have upgraded KNIME Server from 4.8 to 4.13.5.
Most of the time it seems work well but the following error occurred when I executed workflow manually yesterday night:
image

knime.ini

Related log:

2022-03-17 21:44:10,692 : WARN  : pool-1-thread-7 :  : WorkMessageDispatcher :  :  : CPU usage (97.35842441745677) above threshold (90.0)
2022-03-17 21:44:10,705 : WARN  : Consumer reconnect poller :  : WorkMessageDispatcher :  :  : **CPU usage (97.35842441745677) above threshold (90.0)**
..........(Omit same info)
2022-03-17 21:49:23,445 : WARN  : Consumer reconnect poller :  : WorkMessageDispatcher :  :  : **CPU usage (97.67813942752335) above threshold (90.0)**
org.eclipse.core.runtime.CoreException: Error occurred while loading workflow into memory: An error occurred during the creation of a job for '/WF_019_Read_ResearchFile': Job wasn't loaded within 3 minutes. This usually happens if the executor is overloaded or if no executor is available at all. Please check back with your server administrator.

	at com.knime.explorer.server.rest.RestServerExplorerFileStore.throwCoreException(RestServerExplorerFileStore.java:472)
	at com.knime.explorer.server.rest.RestServerExplorerFileStore.loadWorkflow(RestServerExplorerFileStore.java:1522)
	at com.knime.explorer.server.internal.view.actions.rest.RestServerExecuteAction.execute(RestServerExecuteAction.java:262)
	at com.knime.explorer.server.internal.view.actions.rest.RestServerExecuteAction$2.run(RestServerExecuteAction.java:157)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: java.lang.RuntimeException: An error occurred during the creation of a job for '/WF_019_Read_ResearchFile': Job wasn't loaded within 3 minutes. This usually happens if the executor is overloaded or if no executor is available at all. Please check back with your server administrator.

	at com.knime.enterprise.client.rest.RestServerContent.mapToException(RestServerContent.java:429)
	at com.knime.enterprise.client.rest.RestServerContent.createWorkflowJob(RestServerContent.java:1979)
	at com.knime.explorer.server.rest.RestServerExplorerFileStore.loadWorkflow(RestServerExplorerFileStore.java:1514)
	... 3 more
17-Mar-2022 21:41:59.036 情報 [pool-12-thread-8] com.knime.enterprise.server.tokens.TokenServerImpl.acquireTokens Executor f3879a5a-9ba0-4fde-96e8-e2bd225c2060@AWGS072 acquired 4 core tokens. Now there are 0 available core tokens.
17-Mar-2022 21:44:10.681 情報 [http-nio-8080-exec-2] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.loadWorkflow Loading workflow '/WF_019_Read_ResearchFile' for user 'knimeadmin'
17-Mar-2022 21:44:10.684 情報 [http-nio-8080-exec-2] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.loadWorkflow Loading workflow '/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 21.44.10; 1e3ac0ef-a351-4b5a-9381-3261202ede78)' via executor group 'knime-jobs'
17-Mar-2022 21:45:59.044 情報 [pool-12-thread-9] com.knime.enterprise.server.tokens.TokenServerImpl.acquireTokens Executor f3879a5a-9ba0-4fde-96e8-e2bd225c2060@AWGS072 acquired 4 core tokens. Now there are 0 available core tokens.
17-Mar-2022 21:47:10.689 情報 [http-nio-8080-exec-2] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.unload Unloading job '/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 21.44.10; 1e3ac0ef-a351-4b5a-9381-3261202ede78)' via executor group 'knime-jobs', executor 'null'
17-Mar-2022 21:49:02.237 情報 [http-nio-8080-exec-9] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.swapOutJob Swapping job '/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 20.56.16; 0522b8c3-85bb-4688-866d-e44c2d0c9fe3)' (of user knimeadmin, last activity: 2022-03-17T20:56:38.363+09:00[Asia/Tokyo])
17-Mar-2022 21:49:02.237 情報 [http-nio-8080-exec-9] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.swap Swapping job '/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 20.56.16; 0522b8c3-85bb-4688-866d-e44c2d0c9fe3)' via executor group 'knime-jobs', executor 'f3879a5a-9ba0-4fde-96e8-e2bd225c2060@AWGS072'
17-Mar-2022 21:49:28.272 情報 [http-nio-8080-exec-9] com.knime.enterprise.server.executor.msgq.RabbitMQExecutorImpl.unload Unloading job '/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 20.56.16; 0522b8c3-85bb-4688-866d-e44c2d0c9fe3)' via executor group 'knime-jobs', executor 'f3879a5a-9ba0-4fde-96e8-e2bd225c2060@AWGS072'
17-Mar-2022 21:49:28.286 情報 [http-nio-8080-exec-9] com.knime.enterprise.server.jobs.WorkflowJobManagerImpl.swapOutJob Successfully swapped job to disc and released from executor ('/WF_019_Read_ResearchFile (WF_019_Read_ResearchFile 2022-03-17 20.56.16; 0522b8c3-85bb-4688-866d-e44c2d0c9fe3)')
17-Mar-2022 21:49:28.314 情報 [pool-12-thread-4] com.knime.enterprise.server.tokens.TokenServerImpl.releaseTokens Executor f3879a5a-9ba0-4fde-96e8-e2bd225c2060@AWGS072 released all its core tokens. Now there are 4 available core tokens.
17-Mar-2022 21:49:56.533 情報 [pool-12-thread-5] com.knime.enterprise.server.tokens.TokenServerImpl.acquireTokens Executor a2fc5405-1359-4506-8cb0-4be9f722bb9e@AWGS072 acquired 4 core tokens. Now there are 0 available core tokens.

BTW:
After I restart KNIME Executor in the windows service , It works well again.

Thanks in advance
s-ryuu

hi @laughsmile,

have you updated your java installation aswell? rls 4.13 needs a java v11 installation.
and how have you updated your executor?

best,
sven

e: several weeks ago I had a simliar problem and after hours of research I just installed the server from scratch.

1 Like

Thanks @sven-abx

Yes, I also install KNIME Executor from scratch.
Just now I found the reason, This error is caused by insufficient system resources due to antivirus software.

1 Like

Hi @laughsmile,

Thank you for providing your solution.

I just want to add some thoughts to provide you some background what might have caused your issue beside the virus scanner:

Your knime.ini shows that you have configured 8GB executor heap space. That might be enough in some cases but it could cause the executor to run into memory issues, especially when running many workflows in parallel or ones with a high amount of data to be processed.

The first related log shows you that the used 90% of the configured heap space triggered a garbage collection (GC) on the executor. If the JVM is not able to free enough unused RAM it will trigger the next GC to try again.

On systems with a low heap space assignment this could cause many GCs in a row. During the execution of the GC no other processing will happen. Therefore the loading of the workflow ran into a timeout after 3 minutes, maybe the virus scanner caused this also.

This timeout could be raised:

com.knime.server.job.default_load_timeout=<duration with unit, e.g. 60m, 36h, or 2d> [RT]
Specifies how long to wait for a job to get loaded by an Executor. If the job does not get loaded within the timeout, the operation is canceled. The default is 3m. This timeout is only applied if no explicit timeout has been passed with the call.

I would recommend to raise the heap space configuration to 16GB at least. This depends heavily on the total amount of memory on the machine, 16GB should be sufficient to configure while using our minimal hardware memory requirement for the KNIME Server machine of 32GB.

Best,
Michael

1 Like

@MichaelRespondek
Thank you so much.
I will raise the heap space configuration to 16GB.
Best,
Ryuu

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.