running batch mode job can not finish

I run a KNIME job in batch mode, in general, it completes properly,but occasionally it keeps executing and won’t end, and finally, I have to kill it manually.


Here is the log:
23-02-2022 13:20:13 CST goods_detail_pc INFO - Starting job goods_detail_pc at 1645593613300
23-02-2022 13:20:13 CST goods_detail_pc INFO - azkaban.webserver.url property was not set
23-02-2022 13:20:13 CST goods_detail_pc INFO - job JVM args: -Dazkaban.flowid=goods_detail_pc -Dazkaban.execid=23227 -Dazkaban.jobid=goods_detail_pc
23-02-2022 13:20:13 CST goods_detail_pc INFO - user.to.proxy property was not set, defaulting to submit user admin
23-02-2022 13:20:13 CST goods_detail_pc INFO - Building command job executor.
23-02-2022 13:20:13 CST goods_detail_pc INFO - Memory granted for job goods_detail_pc
23-02-2022 13:20:13 CST goods_detail_pc INFO - 1 commands to execute.
23-02-2022 13:20:13 CST goods_detail_pc INFO - cwd=/home/Azkaban-exec-server/bin/executions/23227
23-02-2022 13:20:13 CST goods_detail_pc INFO - effective user is: admin
23-02-2022 13:20:13 CST goods_detail_pc INFO - Command: sh /home/knime_jobs/job_sh/goods_detail_pc.sh
23-02-2022 13:20:13 CST goods_detail_pc INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/home/Azkaban-exec-server/bin/executions/23227/goods_detail_pc_output_189837063191003575_tmp, JOB_PROP_FILE=/home/Azkaban-exec-server/bin/executions/23227/goods_detail_pc_props_541198755904495448_tmp, KRB5CCNAME=/tmp/krb5cc__goods_detail_pc__goods_detail_pc__goods_detail_pc__23227__admin, JOB_NAME=goods_detail_pc}
23-02-2022 13:20:13 CST goods_detail_pc INFO - Working directory: /home/Azkaban-exec-server/bin/executions/23227
23-02-2022 13:20:13 CST goods_detail_pc DEBUG - Spawned thread with process id 1403
23-02-2022 13:20:15 CST goods_detail_pc INFO - 二月 23, 2022 1:20:15 下午 org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
23-02-2022 13:20:15 CST goods_detail_pc INFO - 信息: Adding the extensions from bundle org.apache.cxf.cxf-rt-frontend-jaxrs (176) [org.apache.cxf.jaxrs.JAXRSBindingFactory]
23-02-2022 13:20:15 CST goods_detail_pc INFO - 二月 23, 2022 1:20:15 下午 org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
23-02-2022 13:20:15 CST goods_detail_pc INFO - 信息: Adding the extensions from bundle org.apache.cxf.cxf-rt-transports-http (179) [org.apache.cxf.transport.http.HTTPTransportFactory, org.apache.cxf.transport.http.HTTPWSDLExtensionLoader, org.apache.cxf.transport.http.policy.HTTPClientAssertionBuilder, org.apache.cxf.transport.http.policy.HTTPServerAssertionBuilder, org.apache.cxf.transport.http.policy.NoOpPolicyInterceptorProvider]
23-02-2022 13:20:15 CST goods_detail_pc INFO - 二月 23, 2022 1:20:15 下午 org.apache.cxf.bus.osgi.CXFExtensionBundleListener addExtensions
23-02-2022 13:20:15 CST goods_detail_pc INFO - 信息: Adding the extensions from bundle org.apache.cxf.cxf-rt-transports-http-hc (180) [org.apache.cxf.transport.http.HTTPConduitFactory, org.apache.cxf.transport.ConduitInitiator]
23-02-2022 22:11:28 CST goods_detail_pc ERROR - Kill has been called.
23-02-2022 22:11:33 CST goods_detail_pc INFO - Process completed unsuccessfully in 31879 seconds.
23-02-2022 22:11:33 CST goods_detail_pc ERROR - Job run killed!
java.lang.RuntimeException: azkaban.jobExecutor.utils.process.ProcessFailureException
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:304)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:784)
at azkaban.execapp.JobRunner.doRun(JobRunner.java:600)
at azkaban.execapp.JobRunner.run(JobRunner.java:561)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: azkaban.jobExecutor.utils.process.ProcessFailureException
at azkaban.jobExecutor.utils.process.AzkabanProcess.run(AzkabanProcess.java:130)
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:296)
… 8 more
23-02-2022 22:11:33 CST goods_detail_pc ERROR - azkaban.jobExecutor.utils.process.ProcessFailureException cause: azkaban.jobExecutor.utils.process.ProcessFailureException
23-02-2022 22:11:33 CST goods_detail_pc INFO - Finishing job goods_detail_pc at 1645625493189 with status KILLED

Can someone tell me how to resolve it? Thanks!

Hi @jimozq , how long does it take to run if you run it via the UI? And when it completes properly via the scheduler, how long does it usually take?

It took about 10 seconds via the UI, And took 24 seconds to 2m via the scheduler.
In general, it performs quickly

I know it’s off topic but I want to know how to make a chart like this to generate an alert on whether it was effective or not

Hi @Jalvear , if I’m not mistaken, this is just how the interface of that scheduler is. It’s not a chart that’s generated by Knime.

@jimozq , can you show us what the workflow does? Is there any potential bottleneck? (Table lock if you are doing db operations for example)

It’s a chart in Azkaban, we use Azkaban to schedule the shell command.

The workflow get data from remote mongodb, insert to Clickhouse temp table and local mongodb backup table,and then read the data from the Clickhouse temp table and write it to the Clickhouse fact table after processing。

I guess the reason is not in the job itself, because the same situation occurs in 5 or 6 other jobs.

Hi @jimozq

Can you explain me a little more about azkaban?
Is it free?
Is it easy to implement it?

Thanks in advance for your help.

It is free, and I think DolphinScheduler is better than azkaban.
DolphinScheduler is free too, and is easier to deploy。

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.