I’ve noticed that after upgrade to 18.104.22.168fa2e72f6 my server began to respond with 404 status to the jobs REST API endpoint, and I can’t see the jobs of workflows in the AP as well. I’ve started with direct upgrade from 4.9.2, also I’ve tried a fresh installation with old workspace with the same result.
curl -X GET “https://hostname:8443/knime/rest/v4/jobs” -H “accept: application/vnd.mason+json”
returns 404 now:
date: Thu, 02 Jan 2020 07:46:03 GMT
expires: Thu, 01 Jan 1970 00:00:00 GMT
server: Apache Tomcat
Is it for me only or somebody has the same issue?
is it exclusively the jobs endpoint? Do REST calls to the repository work fine?
It is for jobs only.
This call works fine:
curl -k -u $BASE_AUTH -X GET “https://hostname:8443/knime/rest/v4/repository/Users?deep=false” -H “accept: application/vnd.mason+json”
This call returns 404:
curl -k -u $BASE_AUTH -X GET “https://hostname:8443/knime/rest/v4/jobs” -H “accept: application/vnd.mason+json”
The same results for the swagger interface as well.
From the server side in its localhost_access_log I can see the same
Andriy_Dmytrenko [03/Jan/2020:11:43:44 +0200] “GET /knime/rest/v4/repository/Users?deep=false HTTP/1.1” 200 3554
Andriy_Dmytrenko [03/Jan/2020:11:44:12 +0200] “GET /knime/rest/v4/repository/Users?deep=false HTTP/1.1” 200 3554
Andriy_Dmytrenko [03/Jan/2020:11:45:07 +0200] “GET /knime/rest/v4/jobs HTTP/1.1” 404 17
Andriy_Dmytrenko [03/Jan/2020:11:45:20 +0200] “GET /knime/rest/v4/jobs HTTP/1.1” 404 17
we have a strong suspicion what is causing the problem and are working on it (though it’s very hard to reproduce).
It is very likely just one job (or very few) that cause the entire jobs-endpoint to return the error, and only jobs that have been started from within the WebPortal (i.e. in wizard execution mode). That is if you delete all jobs (or at least the ones you have been starting via the WebPortal), all should be back to normal. You can delete jobs either via the WebPortal or by accessing the server from the AP without using REST (i.e. uncheck ‘Use REST’ in the mountpoint’s configuration).
Hope that helps at least a bit!
It helped, I was able to identify and remove the job causing the issue. Now it works.
I have a workspace backup taken earlier, and also I think I’ve identified the job, but simple copying of it from the workspace backup to the current workspace didn’t broke anything as I expected.
Nevertheless, thanks for your help.
in order to proof our suspicion it would be very helpful to be able to inspect your server logs. Do you have access to the server logs and can provide them for us? (you can download them as follows: login-in as admin into the WebPortal > Administration > Download Server Log Files)
You can, e.g., mail them to me: email@example.com
Thanks a lot!
did the problem occur a second time?
If so, what exactly fixed the faulty state, was a server restart enough or did you have to explicitly delete the faulty job?
Could you also provide your log files as described by Martin?
Hi Moritz, Martin,
I will provide the logs a bit later, just did a few upgrades and I have to fine appropriate set of logs.
I have explicitly deleted a single job, and after this it has been fixed and now works as expected.
we today provided an bugfix release to KNIME Server (4.10.2) in which we also addressed this problem.
Could you perform an update and try again?
Thank you! Iris