I have a question about the ‘vanished’ status of the KNIME Server.
I restart the knime-executor service every two weeks to initialize the server memory.
After restarting, some schedules that were completed as finished are changed to vanished status.
I looked at the ‘KNIME Server Administration Guide’ and there is an option for this,
Yes, this setting is related. The Executor Watchdog runs on the KNIME Server and regularly listens to check-ins from executors to confirm that they are still running. If an executor crashes (or vanishes for some other reason), the watchdog will mark all jobs that were loaded in the executor as vanished. This also includes jobs that already completed, but were still kept on the executor. How long an executor keeps a job in memory after executing nodes can be configured with these parameters:
com.knime.server.job.max_time_in_memory
com.knime.server.job.webportal.max_time_in_memory
Note that the WebPortal parameter should not be set too short, as execution is also paused while waiting for user interaction in the WebPortal.
When the executor is shut down (via Windows services or systemctl stop knime-executor) it should swap all its still open jobs back to the server, but it seems this didn’t work in this case. If currently set to the default, 1 minute, it may make sense to slightly bump the swap timeout parameter to ensure jobs have enough time to be swapped to disk:
com.knime.server.job.default_swap_timeout
Lastly, the watchdog is a completely optional feature. While it is nice to know when jobs vanished due to a crash, it is really only a visual information. Should this not be needed, or if undesired false positives come up, you can fully disable the watchdog by setting the com.knime.server.executor.watchdog.interval to 0s.
Would changing the values of the options below be the best way to go?
If I change it, can you suggest what value to set it to?
(Restarting the knime-executor usually takes less than 3-5 minutes.)
com.knime.server.job.default_swap_timeout
Am I correct that setting com.knime.server.executor.watchdog.interval to 0s means that if the job disappears, it will not change to a status of vanished?
Yes, I would recommend these changes in most cases. The swap timeout is per job, so in theory a server shutdown can take swap_timeout * number_of_loaded_jobs minutes, but this is an extreme case. I’d say it is safe to set com.knime.server.job.default_swap_timeout to 5 or 10 minutes, even longer if you have any single, very large jobs.
Yes, without the watchdog active (set to 0s), jobs will not be marked as vanished.