Troubles with API respondings

Sergey_Bazov · March 23, 2020, 11:48am

We have a workflow API that runs on Knime Server Medium 4.10.2. Periodically, when the load is high, our API stops responding.This lasts from 5 to 20 minutes, then the performance is restored.
I attach the log.
Do you have any idea what the problem is?
localhost_cutted.txt (3.2 MB)

thor · March 23, 2020, 12:40pm

The server will wait for ~1 minute for workflows getting loaded. After that it will cancel and discard the job. This is in line with the ProgressMonitor is canceled message that you get. You can increase the load timeout by passing the “timeout” parameter when loading a job.

Sergey_Bazov · March 23, 2020, 12:44pm

It’s not a job. It’s API workflow, deployed via KNIME Analytics platform. How we can passing “timeout” parameter in this case?

Sergey_Bazov · March 23, 2020, 1:01pm

Config Info

com.knime.server.server_admin_groups: admin
com.knime.server.webportal.debug: false
com.knime.server.login.jwt-lifetime: 30d
com.knime.enterprise.executor.embedded-broker: false
com.knime.server.csp-report-only: false
com.knime.server.job.default_load_timeout: 1m
com.knime.server.job.swap_check_interval: 1m
com.knime.server.executor.knime_exe: /opt/knime/knime-latest/knime
com.knime.server.executor.start_port: 50100
com.knime.server.job.default_swap_timeout: 1m
com.knime.server.executor.max_instances: 200
com.knime.server.repository_path: /srv/knime_server
com.knime.server.job.max_execution_time:
com.knime.server.executor.update_metanodelinks_on_load: false
com.knime.server.executor.skip_teamspace_mount: false
com.knime.server.webportal.title_label: WebPortal
com.knime.server.job.discard_after_timeout: true
com.knime.server.job.max_lifetime: 7d
com.knime.server.default_mount_id: knime-server
com.knime.server.webportal.sketcher_page: VAADIN/sketcher/sketcher.html
com.knime.server.executor.reject_future_workflows: true
com.knime.server.server_admin_users: knimeadmin
com.knime.server.job.status_update_interval: 60s
com.knime.server.job.default_report_timeout: 1m
com.knime.server.webportal.hide_version: false
com.knime.server.executor.max_lifetime: -1
com.knime.server.webportal.sketcher_size: 300.0x300.0
com.knime.server.config.watch: true
com.knime.server.executor.prestart: true
com.knime.server.webportal.csp: default-src ‘self’; script-src ‘unsafe-inline’ ‘unsafe-eval’ ‘self’; style-src ‘unsafe-inline’ ‘self’;img-src ‘self’ data:;
com.knime.server.webportal.disable_warning_messages: true
com.knime.server.job.max_time_in_memory: 60m
com.knime.server.webportal.disable_report_preview: false

AlexanderFillbrunn · March 23, 2020, 1:04pm

Hello Sergey,
the sixth row from the top in your config is: com.knime.server.job.default_load_timeout: 1m. You can set the default timeout to something higher here. Additionally, when making a call to the workflow you can specify the query parameter timeout=<time> to specify the timeout in milliseconds for that particular call.
Kind regards
Alexander

Sergey_Bazov · March 23, 2020, 1:10pm

Thank’s a lot.
There is still a small question. Average time for API answer currently is about 10 seconds. Why 1 minute might not be enough to load the workflow, and can using job pools help?

AlexanderFillbrunn · March 23, 2020, 1:14pm

Hi Sergey,
with a high-load scenario like yours Job Pools can certainly help. The workflows will not be loaded for every call and it seems like you are mostly running one particular workflow, so it should have a significant effect on response time. Please let us know of your findings!
Kind regards
Alexander

Sergey_Bazov · March 24, 2020, 8:06am

Hi. We changed the settings, set 20 job pools and default_load_timeout=2m.
Аfter that, the server worked normally for 6 hours instead of 1-1.5 hours, вut then the same problem arose.
Today we set 50 job pools and will see

Sergey_Bazov · March 31, 2020, 8:53am

Last update: we still have the issue. But we found workaround decision: auto-rebooting KNIME server each hour.

AlexanderFillbrunn · March 31, 2020, 9:36am

Hi @Sergey_Bazov,
KNIME Server periodically (by default every 24 hours) recycles its executor. Maybe it is enough for you to set that time to 1h instead, so you don’t have to reboot the whole server. In your knime-server.config file, change the following setting:

com.knime.server.executor.max_lifetime=<duration with unit, e.g. 60m, 36h, or 2d>

https://docs.knime.com/2018-12/server_admin_guide/index.html#knime-server-configuration-file
Kind regards
Alexander