I sometimes experience issues with the knime server where a job could cause the cpu usage to spike above 90% for an extended period. This then causes the server to become unresponsive.
I’d like to see which jobs are running at the time to narrow down the flows which need to be checked.
I’ve thought about getting all the jobs that were scheduled like an hour before the spike in cpu usage, but realise a job could have been running on the server for 2 - 3 hours before causing the cpu spike.
Ideally, I’d take snapshots of the executing jobs. If the server becomes unresponsive, I’d know exactly which jobs may have caused the problem and investigate them.
Thanks for the suggestion. I’m trying to get this via the knime API since the jobs sometimes fail at odd hours, like 1am etc. They don’t show up as executing.
I’m planning on getting the executing jobs out of the API and write snapshots in 20 - 30 min intervals to a db. So if failures happen at those odd hours, I’ll know exactly which flows to investigate.
Doing the steps manually won’t work when the server goes down after work hours. I’ve got no traceability when checking it later.
You can use the jobs endpoint via REST GET call http://knime.hq.takealot.com:8080/knime/rest/v4/jobs (also possible via a internet browser) and use the state parameter for each job, this gives you the current state for each job, for example:
Just a small addition that came into my mind regarding the memory peaks: Maybe you should check the heap space size setting of the KNIME Executor (-Xmx setting in its knime.ini file) if this is adjusted properly to the available total memory of your server.
E.g. setting -Xmx32G for a KNIME Server with 64GB total memory.
Best,
Michael
Ah, I understand now. No, there is no way beside limiting the amount of cores/threads in general for a specific executor. Or pin a specific workflow to be executed on a dedicated executor in a distributed executor setup on KNIME Server Large. The options are described here, topics Workflow Pinning and following.
Best,
Michael