Server's REST API returns 404 for the jobs endpoint

Hi all,

I’ve noticed that after upgrade to 4.10.0.3fa2e72f6 my server began to respond with 404 status to the jobs REST API endpoint, and I can’t see the jobs of workflows in the AP as well. I’ve started with direct upgrade from 4.9.2, also I’ve tried a fresh installation with old workspace with the same result.
Simple call
curl -X GET “https://hostname:8443/knime/rest/v4/jobs” -H “accept: application/vnd.mason+json”
returns 404 now:

cache-control: private,no-cache
content-length: 17
content-type: text/plain
date: Thu, 02 Jan 2020 07:46:03 GMT
expires: Thu, 01 Jan 1970 00:00:00 GMT
knime-exception-class: java.util.NoSuchElementException
server: Apache Tomcat
x-content-type-options: nosniff

Is it for me only or somebody has the same issue?

Thanks.

Hi @AndriyDmytrenko,

is it exclusively the jobs endpoint? Do REST calls to the repository work fine?

Best,
Marten

Hi Marten,
It is for jobs only.
For example:
This call works fine:

curl -k -u $BASE_AUTH -X GET “https://hostname:8443/knime/rest/v4/repository/Users?deep=false” -H “accept: application/vnd.mason+json”

This call returns 404:

curl -k -u $BASE_AUTH -X GET “https://hostname:8443/knime/rest/v4/jobs” -H “accept: application/vnd.mason+json”

The same results for the swagger interface as well.

From the server side in its localhost_access_log I can see the same

Andriy_Dmytrenko [03/Jan/2020:11:43:44 +0200] “GET /knime/rest/v4/repository/Users?deep=false HTTP/1.1” 200 3554
Andriy_Dmytrenko [03/Jan/2020:11:44:12 +0200] “GET /knime/rest/v4/repository/Users?deep=false HTTP/1.1” 200 3554
Andriy_Dmytrenko [03/Jan/2020:11:45:07 +0200] “GET /knime/rest/v4/jobs HTTP/1.1” 404 17
Andriy_Dmytrenko [03/Jan/2020:11:45:20 +0200] “GET /knime/rest/v4/jobs HTTP/1.1” 404 17

Hi @AndriyDmytrenko,

we have a strong suspicion what is causing the problem and are working on it (though it’s very hard to reproduce).
It is very likely just one job (or very few) that cause the entire jobs-endpoint to return the error, and only jobs that have been started from within the WebPortal (i.e. in wizard execution mode). That is if you delete all jobs (or at least the ones you have been starting via the WebPortal), all should be back to normal. You can delete jobs either via the WebPortal or by accessing the server from the AP without using REST (i.e. uncheck ‘Use REST’ in the mountpoint’s configuration).

Hope that helps at least a bit!

Best,
Martin

Hi Martin,

It helped, I was able to identify and remove the job causing the issue. Now it works.
I have a workspace backup taken earlier, and also I think I’ve identified the job, but simple copying of it from the workspace backup to the current workspace didn’t broke anything as I expected.
Nevertheless, thanks for your help.

Thanks
Andriy.

Hi Andriy,

in order to proof our suspicion it would be very helpful to be able to inspect your server logs. Do you have access to the server logs and can provide them for us? (you can download them as follows: login-in as admin into the WebPortal > Administration > Download Server Log Files)

You can, e.g., mail them to me: martin.horn@knime.com

Thanks a lot!

Best,
Martin

Hi Andriy,

did the problem occur a second time?
If so, what exactly fixed the faulty state, was a server restart enough or did you have to explicitly delete the faulty job?
Could you also provide your log files as described by Martin?

Cheers,
Moritz

Hi Moritz, Martin,

I will provide the logs a bit later, just did a few upgrades and I have to fine appropriate set of logs.
I have explicitly deleted a single job, and after this it has been fixed and now works as expected.

Thanks.
Andriy

1 Like

Hi @AndriyDmytrenko

we today provided an bugfix release to KNIME Server (4.10.2) in which we also addressed this problem.

Could you perform an update and try again?

Thank you! Iris

1 Like