What approaches do folks take on promoting workflows from dev to production?
Interested in knowing what advice the community has for these types of operations.
Also, Is there for example a way to use the “KNIME Integrated Deployment” nodes to upload an entire workflow to production? The provided example workflow shows “capture workflow start” and “capture workflow end” nodes and then a “Deploy Workflow to Server” node to capture “segments” of a workflow. In our case we don’t need to upload a model or data to our production server. We need to upload the entire workflow which fetches its data externally (based on config) during its run. Is there a way we can just connect the entire workflow to a “Deploy Workflow to Server” node to deploy to production? We need something user friendly as we have dozens of these workflows and the users are non-engineers
Hi @kalimist
this is a super interesting topic you’re asking about. I’m afraid I don’t come from a workflow development / productionization side, so I don’t have experiences to share.
However, we’ve built and are extending a huge feature called Continuous Deployment of Data Science (CDDS) into KNIME Business Hub. You’ll find many details and a live demo recording below. Even if you’re sticking to KNIME Server for now, some of the ideas may still be applicable.
That just as a side note – I’ll eagerly follow along here to see what cool solutions other members of the community have employed.
Kind regards
Marvin
Not many replies here @marvin.kickuth. Can you please elaborate on how we can use Knime Hub to actually promote from a dev server environment to a production server environment?
Well, to jump in then: the way we do it is not very exciting. User uploads a component/workflow to our development folder, it goes for QA by someone else, who then signs it off and releases it to production
@ArjenEX, I appreciate your response. Do you mind if I ask a few questions?
This is using Knime hub I take it? So you don’t have separate dev/prod servers as such. what is “development” and “production” in your scenario? Are these folders in the same server/environment? Does QA involve someone else downloading the workflow from the “development” folder and running it locally? What does “release to production” look like?
Sure @kalimist
This is still on KNIME Server. We are planning to move to the hub whenever it meets our business requirements and we can move across seamlessly. We have one server with a Large instance of KNIME where we host workflows and components which are both in development and in production. Just managed by using different folders.
We do have a small scale standalone dev server but this is mainly used for software testing: trying to new client updates first to ensure compatibility of all our previous work before we deploy it to all employees, play around with the new Hub, testing new extensions, etc.
QA is indeed that. Often someone develops a workflow or component based on a particular use case. Our goal is to have a standardized solution that works in almost all cases. So QA involves could involve not just connecting to one database type, but to three to make that a SQL query for example is compatible in all cases and the component can be used throughout. And in general just looking for potential flaws in the logic.
Release to production normally be an announcement in our internal communication channel and the publication of a knowledge base article that outlines what the workflow/component is about and how it should be used.
Hope this helps
I appreciate that @ArjenX, thank you. I am particularly interested to hear if anyone has separate environments for dev and prod and what they do to promote between environments.
Hi @kalimist,
sorry for the late reply, but I hope I can (start) clarifying some of your questions and continue the discussion. There is not the single way to actually model dev/test/prod setups using KNIME AP and KNIME Business Hub, rather we want to enable our customers to setup their customized processes. @SimonS presented one reference implementation of our CDDS framework (see @marvin.kickuth link) at our summit and there is some more getting started material + documentation available on our Community Hub (Getting started with Continuous Deployment for Data Science (CDDS) for Business Hub – knime – KNIME Community Hub).
Let me give you two examples how you can set it up:
-
Single Business Hub Instance: On KNIME Business Hub access permissions on workflows, components and files are granted through so called Spaces (very similar to what you call Folder in your post). You can have for example a space for dev (or multiple spaces for dev), one for test and one for prod where only certain users have access to the workflows within the space. You can setup automation through schedules or triggered workflow execution to react on certain events in the each space (e.g. new workflow was moved to “test” - let’s run some automated workflows to verfiy the workflow or deploy it into production). In addition, you can have multiple, isolated so called Execution Contexts within a single KNIME Business Hub installation which are used to run workflows. Execution Contexts are separate from each other in terms of resource utilization, Analytics Platform configuration (e.g. version of AP) other settings and permissions. What we oftentimes see is that for each step (dev/test/prod) a separate Execution Context is being used to ensure isolation.
-
Multiple Business Hub instances: If you want to physically separate dev/test/prod from each other (e.g. due to internal IT requirements) then you can also use multiple Hub instances to model dev/test/prod.
Both options (and many others) can be automated using KNIME workflows (e.g. via integrated deployment capabilities). You can move around workflows (I think this is your use-case) from space to space or from Hub instance to another Hub instance. You can automate even more when it comes to model management, model monitoring, model updating (also through integrated deployment), but I think that’s not your use-case?
I know that’s a lot of text and maybe here and there too technical. I’m more than happy to continue the discussion here and clarify your questions Super interesting topic!
Best,
Christian
Not too technical at all @christian.birkhold . We are in the middle of things and there is a reluctance to move to hub at this time. Looking at using Python scripts which are triggered during CICD toutilise rest calls to Knime server API to download and upload workflows between environments, but it is early days. Will let folks know how it goes. Interested in how other folks are approaching this problem
Ah, I understand. Similar capabilities we basically abstracted into some KNIME nodes. Meaning, instead of having to involve Python folks, you can also build workflows doing the very same thing (e.g. the file handling nodes, server/hub connector, workflow loader nodes etc). This also works with KNIME Server.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.