Currently executing jobs on the knime server

Hi Guys,

I sometimes experience issues with the knime server where a job could cause the cpu usage to spike above 90% for an extended period. This then causes the server to become unresponsive.

I’d like to see which jobs are running at the time to narrow down the flows which need to be checked.
I’ve thought about getting all the jobs that were scheduled like an hour before the spike in cpu usage, but realise a job could have been running on the server for 2 - 3 hours before causing the cpu spike.

I’ve looked at the api documentation (http://knime.hq.takealot.com:8080/knime/rest/doc/index.html#/Jobs), but can’t find anything that’ll tell me the jobs executing at the moment.

Ideally, I’d take snapshots of the executing jobs. If the server becomes unresponsive, I’d know exactly which jobs may have caused the problem and investigate them.

Is there a way to get this information?

Hi @Albert123,

I think you can filter for all running jobs in the Jobs overview of the Web Portal:
grafik

3 Likes

Hi @AnotherFraudUser,

Thanks for the suggestion. I’m trying to get this via the knime API since the jobs sometimes fail at odd hours, like 1am etc. They don’t show up as executing.

I’m planning on getting the executing jobs out of the API and write snapshots in 20 - 30 min intervals to a db. So if failures happen at those odd hours, I’ll know exactly which flows to investigate.

Doing the steps manually won’t work when the server goes down after work hours. I’ve got no traceability when checking it later.

3 Likes

Hi @Albert123,

You can use the jobs endpoint via REST GET call http://knime.hq.takealot.com:8080/knime/rest/v4/jobs (also possible via a internet browser) and use the state parameter for each job, this gives you the current state for each job, for example:

"id" : "0df269ec-b6de-4a21-a37d-3782158e4dc0",
    "discardAfterSuccessfulExec" : false,
    "discardAfterFailedExec" : false,
    "actions" : [ ],
    "configuration" : { },
    "executorName" : "knimemr",
    "executorIPs" : [ "10.0.0.73" ],
    "executorID" : "36a53e86-f161-4ba0-a21d-dcd7348384e8",
    "createdVia" : "Webportal",
    "state" : "EXECUTION_FINISHED",
    "owner" : "knimeadmin",
    "isOutdated" : false,
    "createdAt" : "2021-10-27T13:15:03.450754+02:00[Europe/Berlin]",
    "startedExecutionAt" : "2021-10-27T13:15:31.534793+02:00[Europe/Berlin]",
    "notifications" : { },
    "finishedExecutionAt" : "2021-10-27T13:15:35.604651+02:00[Europe/Berlin]",
    "workflow" : "/Examples/01 - WebPortal/01 - General/02 - Using the Sunburst Chart for Titanic",
    "hasReport" : false,
    "isSwapped" : false,
    "name" : "02 - Using the Sunburst Chart for Titanic 2021-10-27 13.15.03",
    "properties" : {
      "com.knime.enterprise.server.executor.requirements" : "",
      "com.knime.enterprise.server.jobpool.size" : "0"

You will need a KNIME Server Medium or Large license to access the REST API.

Best,
Michael

7 Likes

Thanks @MichaelRespondek! Works like a charm!

2 Likes

Just a small addition that came into my mind regarding the memory peaks: Maybe you should check the heap space size setting of the KNIME Executor (-Xmx setting in its knime.ini file) if this is adjusted properly to the available total memory of your server.
E.g. setting -Xmx32G for a KNIME Server with 64GB total memory.
Best,
Michael

2 Likes

Hi @MichaelRespondek,

Thank you. The issue isn’t so much with memory, but rather with cpu processing. Not sure if there’s a way to limit cpu resources.

Ah, I understand now. No, there is no way beside limiting the amount of cores/threads in general for a specific executor. Or pin a specific workflow to be executed on a dedicated executor in a distributed executor setup on KNIME Server Large. The options are described here, topics Workflow Pinning and following.
Best,
Michael

Thanks @MichaelRespondek! I’ll go read up on it :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.