When we start a job (using workflow APIs) and while it’s running properly, it suddenly gets discarded after a few minutes without having any errors in the workflow.
What could be the cause of this issue? In what circumstances do jobs get discarded automatically? How can we fix this?
No, it’s not the job max lifetime. The issue is that jobs get discarded while they are active and running.
Anyway, the configuration you mentioned is set to 7 days.
KNIME Executor version: 4.2.4
KNIME Server version: 4.11.4.0153-b0c4cc5ef
The max_execution_time is not set. discard_after_timeout is true.
This happens to different workflows. But currently I have one that I’m sure about and this is the one which takes more than 10 minutes typically and the issue occurs after this time (but random, 10 to 15 minutes).
Also it seems the issue happens when we run jobs by using workflow APIs.
I see recently we have had some similar questions about this topic, so perhaps I can help with some clarification.
When using the :execution endpoint, as documented in the Swagger page, KNIME Server will create a job, execute this job, and discard it. However, there is a very important aspect of this: if the job is not executed within a certain timeout (default is ten minutes), the job will be cancelled and discarded. The timeout is not a loading timeout, but it is a call timeout. It is not possible to have an infinite timeout, either.
So there are two things you can do:
a) increase your time out
b) perform separate calls to create a job, and to execute a job, like I do in this example: https://kni.me/w/WK4ocXripq5o9quY
Hi @ana_ved ,
The workflow gets loaded and executed. It gets discarded in the middle of execution. I can track the progress by opening the job in KNIME AP but suddenly (after about 10 minutes) it gets discarded automatically.
What I’m suspecting is related to our last issue where we updated KNIME Server to 4.11.4 and then the workflows were not getting loaded. The issue was fixed by chance but now I guess the executor and KNIME Server are not communicating well so although the workflows get loaded, Server doesn’t know and discards the jobs.
Are you using the :execution endpoint to execute with the standard timeout? If so, it is expected to get discarded in 10 minutes even if the workflow execution was not finished. The timeout is the call time out, not loading timeout.
If the above is not the case, we should investigate your hypothesis. Let me know
I think I’m not able to send workflow parameters using “jobs” instead of “execution” since I’m using container input nodes. The request body contains the workflow variables not the parameters I have created in the JSON Container.
I think I talked about this in one of the Summit sessions and it was agreed that I need to use “execution” to be able to send parameters. Am I missing something?
Is this the correct format to send the request with the “timeout” parameter? when I use Swagger, the timeout works, but the request like the one above from an external app doesn’t.
I can see the request returns a 500 (Internal Server Error) response code. Whenever there is a code other than 2xx, there should be a response body containing the specific error message. Does the external tool you are using to call the workflow allows you to inspect the response?
If not, you could try to call this workflow by using Postman and see what kind of error you see there.
There is also an issue that a defined parameter (db-schema) cannot be used since there is no node that makes use of it. Can you remove that one if not needed and try again?
For this call you can specify the parameters in the request body like this:
{
“json-input” : {“param”: “value”}
}
“json-input” refers to the parameter name of the Container Input (JSON) node and might be different in your case.
Anas workflow shows how to do this two step approach with a KNIME Workflow: create_job_and_execute.knwf (24.4 KB)
If you want to do it with an external tool, you need to take the same steps as described. You might need to wait a little bit between each call because you can’t execute the workflow if it has not been successfully loaded.
It seems that the timeout parameter has a max value of less than 30 minutes. I tested in Swagger and when I set the timeout to 1800000 the job gets discarded before it is finished. The test workflow runs a wait node and I set the wait time to 16 minutes. The job gets discarded before it is successfully completed.
So it seems that we have to use the “jobs” endpoint instead of “execution”. I will try your solution and will get back to you.
Thank you for taking the time to solve this issue.
Thank you so much guys. The 2 step method to run the jobs works perfectly fine. We could send parameters to the jobs in the second request and the workflow did not get discarded.