KNIME as a Service

AndriyDmytrenko · November 27, 2019, 10:49am

Hi Team,
We’ve got an idea to build something like KNIME as a Service. I general it should look like a cloud based infrastructure which is designed to provide “workflow running service”. Let’s say we have different levels of service: Platinum, Gold, Bronze; each user is assigned to one of them, Bronze is the default. As usual in other *aaS-es.
Each service level has its own SLAs and targets, so we have to provide some workflow runtime statistics, troubleshooting services etc. So the questions are:

Is there a way to distinguish user’s workflows and jobs, to provide higher level of logging and monitoring of workflows for different service levels?
How can we get a proper runtime statistics and notifications (started, finished, failed, stages of execution etc.) from workflow? I have an idea to ask users to inject some preconfigured metanodes on start, finish and in some important stages of workflow, which uses Timer Info node and pushes its data among with other stats via REST API call to some statistics aggregator…
If there a way to prioritize some users against others? We can use a workflow pinning with distributed executors, so we can assign labels to executors and provide them to the users, but how can we forbid some users to use some labels?

Any other ideas regarding this hypocritical service are welcome, of course.

Thanks.
Andriy

RolandBurger · November 28, 2019, 1:43pm

Hi Andriy,

Thanks for reaching out! While we, as of now, don’t support everything you’d need right out of the box, I do see some alignment with our future planning. I think it makes sense to schedule a call with your side so we can make sure that we’re on the same page.

But to answer some stuff right away:

At present, jobs are treated the same in terms of logging and monitoring. What would a “higher level of logging” need to look like from your perspective? Is this literally a higher log level, or something else?
The dates of creation, start, and finishing of a job are all part of that job’s API endpoint. Therefore, if you’re purely looking for the total execution time, no additional steps are necessary, as you have that information in any case. If, instead, you are looking for more segmented statistics (which section of my job takes the longest), then your proposed idea of using Timer nodes is the way to go.
Currently, it is not possible to guarantee exclusive use of an executor. That being said, this is not the first time this idea has come up, and I see great value in it. I’ll add your request to the (already existing) ticket.

Cheers,
Roland