Workflow Performance Metrics

Hello,

I am currently developing a solution aimed at providing us with comprehensive statistical insights related to our workflow. Our focus lies on acquiring information regarding:

  • Historical counts of unsuccessful workflow runs, particularly those occurring within the last month.
  • The percentage of unsuccessful runs in relation to the total number of workflow executions.

In essence, we seek historical data on both successful and unsuccessful runs.
I kindly request your suggestions or advice on how to proceed. While I am aware of the option involving email notifications, this is not a suitable solution for our needs.

regards
Patrycja

@patpie welcome to the KNIME forum. The KNIME server would collect such informations and you can access them thru REST-API nodes:

1 Like

Hello,

I believe that using the “02 - Discard failed jobs” action could be beneficial for our situation. Could you please correct me if I’m mistaken? I propose the following steps:

  1. The initial step would be obtaining the necessary credentials for our current workflow (the one from which we intend to list jobs).
  2. Next, using a Node GET request - we would retrieve all jobs stored on the server, allowing us to comprehensively list all the available jobs.
  3. Then, we can apply a filter to isolate the jobs that have failed.

Am I on the right track? I have concerns about the current state of jobs on the server. How much historical data is available at this moment? For instance, can we access information from today, or even from yesterday? I’m essentially inquiring about the extent of the server’s historical data.

any update please :slight_smile: ?

Hi @patpie -

Let me move this to our server forum for further assistance from our support team.

Hi @patpie,

how far historical data is available from the KNIME Server depends on the data you’re interested in. There are multiple configurations to consider. E.g. for how long jobs are stored, see

com.knime.server.job.max_lifetime=<e.g. 36h, or 2d>
Specifies the time of inactivity, before a job gets discarded (defaults to 7d).
Negative numbers disable forced auto-discard.

You can set this and other timeouts in the WebPortal under Administration > Configuration.

Personally, I’d try to get the information you’re looking for from the server’s log files. Specifically, you can enable a job logger (during a server maintenance) that stores information about job executions in a convenient json format:
https://docs.knime.com/latest/server_admin_guide/index.html#_job_tracing

Maybe this workflow could also help as a starting point to analyze log files:

Kind regards
Marvin

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.