Unable to access jobs in a WorkFlow run on the server

Hi KNIME Support Team.

After running a WorkFlow on the server and it succeeds, when I open the job a few hours later, I get the following error.

If I click on the Job immediately after success, it enters the Workflow, but not after a few hours after success.

Any idea why?

The node under test contains a node that creates a Spark Session with Livy and terminates the Spark Session with a Destroy Spark Context node, so is the Spark Session terminated and preventing me from opening the Job?

I would be grateful for your answer.

Hi @JaeHwanChoi,
I am not a KNIME Team member but…
Based on my experience, this looks like it could be due to 2 reasons:

  • Swapping of the job fails. Here some of the potential reasons are explained. Is the size of your workflow very large when executed? (Several GBs?). If that’s the case, increasing the swap timeout in the server settings could help (default is 1 min).

com.knime.server.job.default_swap_timeout=<duration with unit, e.g. 60m, 36h, or 2d> [RT]
Specifies how long to wait for a job to be swapped to disk. If the job is not swapped within the timeout, the operation is canceled. The default is 1m. This timeout is only applied if no explicit timeout has been passed with the call (e.g. during server shutdown).

  • Job is swapped correctly but it times out when you try to open it again. If that’s the case, increasing the timeout of requests from KNIME could help (default is also 1 min).

com.knime.server.gateway.timeout=<duration with unit, e.g. 30s, 1m> [RT]
Specifies the timeout used internally for gateway requests coming from the KNIME Analytics Platform Remote Job View or from KNIME WebPortal. Default value is 1m.

The fact that you can open the job a few minutes after execution indicates that you can successfully open it as long as the job is still on memory of the executor. Once it is swapped, something goes wrong.

I hope this can help to shed some light!

1 Like

Hello,

That workflow looks like it has a mixture of status effects on it. [1]

  • Not successfully finished
  • successfully executed
  • job of overwritten workflow

I’d need to conduct more testing to determine if it is perhaps unavailable for opening due to the overwritten workflow bit (like if you had updated the workflow and then run the newer version on 10-13).

Other settings [2] that could potentially enter the mix would be com.knime.server.job.discard_after_timeout (default true, discards jobs that exceed max execution time) and com.knime.server.job.max_execution_time (default unlimited), or possibly com.knime.server.job.max_lifetime (default 7d). I’d posit if any of these meant the job had been discarded, it may not be able to open it on request.

TLDR; I’d look at either

  • can’t open because of the overwritten workflow indicator; or
  • can’t open because it’s been somehow discarded.

Regards,
Nickolaus

[1] KNIME Server User Guide
[2] KNIME Server Administration Guide

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.