Hi dear Knimler’s,
I had the rare occurrence to have half a day at my hand and a problem with KNIME that bothered me for some time.
I love KNIME and use it in my daily worklife. Sometimes I develop a small workflow and ask myself: how to in implement it in production without using the KNIME Server?
Why without the KNIME Server? Because I’m the only one using KNME and I can’t get management to spent a larger sum just because of one or two workflows. Also when I want to use REST capabilities I have to go with, and as far that I’m aware, with KNIME large Server.
So I was asking myself, what are my options to run a workflow and integrate it via REST?
Not really a lot.
I could run a Server with KNIME analytics Platform on it via cron job run the workflow, but that only works with periodic workloads and the integration have to come from the workflow itself without parameters (so it has to get the information from somewhere else).
To integrate it via REST I would have to run a webserver with some logic and then run the workflow via that logic with batch mode.
Or (what I did before) translate the workflow to another language/tool….
These options are not really great and have a lot of drawbacks (also my solution…).
A Server have to be maintained and running the thing is not really cost efficient (depends of the number of executions per day). To rewrite into code is viable, but a pain in the ass (also I’m not a coder anymore)
So what I did was thinking about FaaS (Function as a Service), because I get a REST interface for free, it is (normally) cheap and easy to scale.
I decided an azure functions and an python function because of knimepy (GitHub - knime/knimepy), that makes it easy to call a knime workflow in batch mode.
My setup is the following:
-
Set up a linux consumption plan with an python function
-
Upload a fresh knime installation and workflow to azure storage account (file share)
-
Mount azure file share with KNIME on it to the function
-
Call KNIME with knimepy
So in the end the pipline is the following:
-
REST-Call to azure function (with data in body)
-
Python functions calls KNIME on azure files
-
Workflow runs with parameters from the the python function
-
Return results to function and pass it back to caller
After some tinkering it really worked!
So lets talk about the benefits and the drawbacks.
First of all, KNIME Server is a wayyyyyyyy better alternative than this if you can affort it! Why? The biggest thing is the startup time of the KNIME Analytics Platform (about 10 Seconds, with no plugins) every time you call the function, because you can’t interact with a running KNIME instance. There are way’s around it, but they are not pretty.
There is also the convenience of deployment, monitoring, debugging, having different KNIME instances if they use different plugins (or use one big one, but that makes starting for all functions slower).
The benefit is that you get a webserver, authentication, monitoring and scalability for very little money.
I have to mention that there are some limitations for this solution!
-
With the consumption plan you can only have 10 minute long workflows
-
The max memory is 1,5 GB
But you can scale that up to unlimited (time and memory) when you go to some other hosting plan (Premium, Dedicated, Custom), but it is a little bit more expensive.
I did some testing and for running my test workflow (workflow without startup runs 3 minutes) 30 times, in a linux consumption plan, costs about 0,25€. I’m not sure if I missing some cost, but that what the azure interface tells me.
To conclude:
I would love to use the KNIME Server with REST capabilties, but for my few workflows that is not an option. KNIME as FaaS works, but it has limitations and drawbacks (a lot).
So when you have only one or two workflows, that are not super heavy on the memory, are infrequent called and not mission critical, try it out.
Thank you for reading my thoughts and I really hope that I have not violated some terms and conditions of KNIME, if so please remove my post.
Best regards,
Paul