I am currently managing a KNIME Business Hub with numerous workflows that make extensive use of Python nodes. To enhance organization and monitoring, I need to classify these workflows into two categories:
Workflows that utilize machine learning (ML) libraries (e.g., scikit-learn, TensorFlow, PyTorch, etc.) in their Python nodes.
Workflows that do not involve ML libraries in their Python nodes.
I want to develop an automated or semi-automated way to identify and segregate these workflows based on the presence of ML libraries in the Python scripts.
I am looking for suggestions, best practices, or examples from the community to efficiently manage this segregation process. Any guidance on tools, scripting approaches, or workflow design for this purpose would be greatly appreciated.
For classifying workflows, you can create a workflow to achieve this. The Workflow Reader and Workflow Summary Extractor nodes are great tools that allow you to extract Python snippets from workflows. Once you have the snippets, you can easily check for specific library imports to classify the workflows accordingly. @LukasS might have additional pointers or examples to help you get started.
You can also automate this “classification workflow” by scheduling it or deploying it as a Trigger. For instance, you could configure the trigger to automatically run whenever a new workflow is uploaded to the Hub.
Could you share a bit more about the motivation behind classifying these workflows? Do you have an approach in mind for how you’d like to “mark” the classified workflows? It’s worth noting that the upcoming KNIME 1.13 release will include a new feature to label workflow versions directly in the Hub. This functionality might align perfectly with your needs.
I’d be happy to schedule a call in the new year to discuss your use case and potential solutions further. Let me know if that works for you!