Knime Server does not working properly after server crashes

Hi,

After our server crashed, KNIME server does not working properly. And sending this message on every job:

“… was unexpectedly removed from the KNIME Server executor. This can happen if an executor is forcibly restarted, or crashes while a job is running. Re-executing the job should result in the expected output. If you find repeated issues, you may need to investigate the workflow that is being executed for issues.”

we thought its an ram issue, so we increased memory size. 72/96 .

i have also read in forum topics that set max_lifetime =-1 (now its 7d as default)

it didnt change anything. i still getting this messages.

it seems knime server work but there is an job which consumes a lot of ram. another question is maybe is it possible to reduce or restrict ram usage per job.

log detail:
11-May-2023 23:06:36.587 SEVERE [http-nio-8080-exec-6] org.apache.cxf.jaxrs.utils.JAXRSUtils.logMessageHandlerProblem Problem with writing the data, class com.knime.enterprise.server.rest.api.v4.jobs.ent.JobList, ContentType: application/vnd.mason+json

knime.ini
-startup
plugins/org.eclipse.equinox.launcher_1.5.700.v20200207-2156.jar
–launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.1100.v20190907-0426
-vm
plugins/org.knime.binary.jre.linux.x86_64_1.8.0.252-b09/jre/bin
-vmargs
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass

-XX:+UseG1GC

-XX:+UseG1GC -XX:+DisableExplicitGC
-XX:MaxGCPauseMillis=500
-XX:+UseStringDeduplication
-XX:+ParallelRefProcEnabled
-XX:MaxTenuringThreshold=1

-XX:+UseG1GC

-Dsun.net.client.defaultReadTimeout=0
-XX:CompileCommand=exclude,javax/swing/text/GlyphView,getBreakSpot
-Xmx72G
-Dorg.eclipse.swt.internal.gtk.disablePrinting
-Darrow.enable_unsafe_memory_access=true
-Darrow.memory.debug.allocator=false
-Darrow.enable_null_check_for_get=false

any advice

thanks

Yasin SARI

Hi Yasin,

It seems that your executor is not available to process workflows. Have you already tried to restart the executor service?
Could you also please mention the KNIME Server version you are currently using?

Thanks,
Michael

Hi Michael,

Our knime server version is 4.12 running in linux.
I have restarted server it didn’t help and stopped and restarted the service.
We have also increased the memory.

The only solution is stopping the most demanding job to disable seems to resolve the issue. But it didn’t answer the question why it is not working after server crashes. The job worked also before we increased memory.

After reading forum answer and many trials of different things and talking with our system administrators, we decided to move on and upgrade the knime server.

Thanks

Hi,

Our problem has a similarity of this problem which is described in this link:

Server is working until it reaches some point where there is no enough memory to work. On that point it throws out “out of memory” error and drops actively working workflows.

Hi Yasin,

Sorry it seems that I missed to recognize your last reply. The mentioned thread is related to a an old KNIME Server version with a different architecture than the recent. Therefor the cases are different.

What is the current state of your KNIME Server? Were you able to update it to a recent version, do you still experience issues?

Thanks,
Michael

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.