It's hard to know, we're monitoring the vm the knime server is running on to see if we can find out better. I have a feeling it may be a memory issue, and when if a job is taking up a lot of memory other jobs get held up. We use it for a variety of tasks - from twitter scraping, live port analysis, and some large data table etl. Because of this there are time when quite a few jobs are scheduled to run at the same time.
We had a problem before and we found several runtime directories created with many many entries in the jobs directory also. It happened again a few days ago, and we had a hard time recovering -- not sure how to remove them safely.
What does the com.knime.server.job.max_lifetime flag do? I just came across it in the knime-server.config file.
All server configuration options are described in the administration manual. com.knime.server.job.max_lifetime specified after how many hours/days a non-executing job is automatically discarded.
If you don't need the actual workflow for your scheduled jobs, e.g. when the results are written somewhere else by the workflow, or a report is sent by mail, then you should always enable the option to discard the job right after execution. Otherwise it will use resources until it is manually discarded or the above duration has expired.