We have experienced some issues with the KNIME scheduler recently. When we schedule a KNIME job, we select the option to “Skip the execution if previous job is still running”.
In doing so it does skip the execution if previous job is still running but it DOES NOT enable the schedule again(or disables it completely) and keeps skipping the schedule. Is it the expected behavior from the scheduler? A sample excerpt from the log is below.
21-Jun-2018 14:14:00.527 INFO [KNIME-Job-Scheduler_1] com.knime.enterprise.server.jobs.ScheduledJobManagerImpl.loadAndExecute Scheduled job for ‘Workflow ‘/WorkflowCategory/WorkflowJobName’; resetting; discarding after execution; target name = Workflowname; next execution at 2018-06-21T14:15-07:00[America/Los_Angeles], repeating every minute; User: Username; ID: d20fb4e3-03a1-4b21-9b81-ce5cdc43c5d5’ is still running, skipping current execution.
This kind of log is created every time the workflow is trying to execute even if the previous execution is completed.
In our used case we have certain workflows scheduled to be executed every 5 mins. If any of the execution takes more than 5 mins the other execution does not start(as expected) but even if the workflow has been fully executed the following execution schedules are not triggered(not expected) and it gives a log as mentioned above.
Any help in understanding this behavior is appreciated.
I just tested the scheduler on my KNIME Server and it works as expected for me. Here’s what I did:
- Create a workflow with only a Wait node, that waits for 90 seconds.
- Upload to server and set to run on a schedule every 60 seconds, with the ‘Skip execution if previous job still running’.
I’d expect to see workflows execute every second minute (since the workflow takes 90 seconds to run, which it does.
I’ve attached the workflow, can you test it on your server and take a screenshot of the workflow repository after 5 minutes?
Can you also share the KNIME Server version that you’re using?
waitandschedule.knwf (4.5 KB)
Just a short update here. In the split post you noted that you are using KNIME Server 4.6.2.
While trying to reproduce the issue, I discovered a bug that causes the schedule to fail in the case where a job is manually cancelled while running. We’ve already made a fix and will release that as part of a patch release (4.6.4) in the near future. In any case, it would still be great to see the log files (send via DM) so that I can check into the cause of the issue for you.
Hi @jonfuller . We are working with @jeffgullick-knime to get the issue resolved.
Just following up here. The fix is available in the 4.6.4 patch release. I believe that Jeff already helped you to get this patched, and verified that it solved the issue.
Hi, I want to re-open this topic as we are facing similar issues. We are using KNIME Server 4.12. When a workflow is scheduled with the option “skip if previous job is still running” and one job fails no further jobs will be started by the scheduler. Any ideas? Maybe @jeffgullick-knime?
Thanks in advance.
sorry to hear you’re also encountering issues.
Maybe the Job that fails still exists on the server (Monitoring → Jobs). If that is the case, what state is it in?
Could you export the KNIME Server logs (in the WebPortal → Monitoring → Logs) for a timeframe where this issue occurred and send them to email@example.com (with a reference to this forum thread)? Ideally, also let us know what the name of the Workflow was, or maybe even the Schedule or Job ID, if you happen to know that.
We would be happy to investigate what is causing this problem.
Follow-up after Marvin had a look into the logs:
"I can see a Schedule for [workflow XY] having problems. The Schedule still exists, but the Server thinks that the Job from the 17th is still running and hence does not start a new Job (as set by the user when the Schedule was created).
I think this is most likely a bug we have in versions prior to 4.12.4: in some (not quite clear) cases when a Job fails before its proper execution begins, the server does not properly register the Job as failed, which keeps further Jobs from starting."