My first impressions developing Pure Python Nodes for KNIME / Feature whishlist

PTDataScientist · October 4, 2022, 10:06pm

Hi everyone,

I’d also like to share my experience using the (somewhat new) ability to develop KNIME Nodes in pure Python. Result of this effort can be found on GitHub for the moment

TLDR: The ability to develop KNIME nodes in Python requires much less boilerplate code than the Java counterpart. At the moment this “simplified” interface comes with some limitations which I hope will be addressed in future versions.

The general impression of the new functionality is quite good. Although the API documentation could be more verbose at one point or the other the general use of the Python interface was fairly straightforward. Implementing a node purely in Python required much less boilerplate code than in Java. However, this abstraction comes at the cost of reduced flexibility (and functionality): There’s little to no option to adjust details of the configuration interface (besides grouping), use of FlowVariables is not completely implemented (only variables within data connections can be used and not the dedicated FlowVariable Ports). Also as of now it is not possible to exchange data other than table between the Python nodes and the “regular” nodes.

Don’t get me wrong. While that might sound like a lot of drawbacks and limitations this new integration opens the large ecosystem of Python libraries for data wrangling which can be easily exploited for KNIME with a few lines of Python code. I am really looking forward to the future development of this interface.

This brings me to my feature whishlist

I miss the “OTHER” category which would be perfect for the Apprise notification nodes
It would be great if there was an option to specify a multiline text input field
Some more details in the documentation would be helpful but I could figure everything I need for my rather simple nodes. This might automatically get better with more OpenSource Python nodes available which one can learn from
An interface between the various KNIME port types and Python would be great. I’d like to be able to transfer images from Java to Python for further processing (or am I missing something here?)
Fully functional FlowVariable ports
Last but not least and fairly obvious a better (full) integration of the FlowVariables for configuring nodes would be great for more versatile configuration options

carstenhaubold · October 5, 2022, 12:07pm

Hi @PTDataScientist,

Thanks for the feedback and thanks for sharing your node implementations! That’s great to hear! Let me address your wishlist:

I miss the “OTHER” category which would be perfect for the Apprise notification nodes

Good point, we’ll add that soon!

It would be great if there was an option to specify a multiline text input field

Very good idea!

Some more details in the documentation would be helpful but I could figure everything I need for my rather simple nodes. This might automatically get better with more OpenSource Python nodes available which one can learn from

Just to make sure, did you see both ReadTheDocs and the Python node development guide at docs.knime.com? We’re continuously trying to improve the documentation. Could you tell us which parts exactly you think need refinement?

An interface between the various KNIME port types and Python would be great. I’d like to be able to transfer images from Java to Python for further processing (or am I missing something here?)

Could you tell us a little more what use case exactly you are envisioning? Here are two answers to my interpretations:

Adding user-defined port types that can be used on both the Java and Python side is something that would be nice to have, but is rather involved. It would probably mean that a developer who wants to provide a certain port type has to implement serializers and deserializers for Java and Python to be able to use the port data on both sides. This is something we’d love to have, but is not very high up on our priority list.
Supporting more data types that are stored inside the input/output tables is something we are currently working on, and images are next up on our list. So image output from Python nodes will be possible soon, but not as dedicated port type (I guess you are thinking along the lines of the Python scripting node’s “image output port”?).

Fully functional FlowVariable ports

Last but not least and fairly obvious a better (full) integration of the FlowVariables for configuring nodes would be great for more versatile configuration options

What exactly are you missing here? We did not add an explicit FlowVariable port on the Python side because all nodes do have an implicit FlowVariable port, and each Python node provides the incoming flow variables in the flow_variables dict. But you are right, there is no way yet to enforce the visibility of the FlowVariable port of the node.

Looking at your nodes – nice work! – and seeing the stub input tables I assume you were trying to make sure your Python node can be connected to upstream nodes such that it is executed after the upstream nodes. To do that you could simply connect them via the (hidden) flow variable ports and omit the stub input table. Does that already do the trick for what you were trying to do?

If you are looking for ways to override variables of your Python nodes via flow variables, this is possible by right clicking the node and selecting the context menu item “Configure Flow Variables”.

Hope this helps, we really appreciate your constructive feedback! Keep it coming
Carsten

PTDataScientist · October 5, 2022, 11:05pm

Hi Carsten,

An interface between the various KNIME port types and Python would be great. I’d like to be able to transfer images from Java to Python for further processing (or am I missing something here?)

Could you tell us a little more what use case exactly you are envisioning? Here are two answers to my interpretations:

Adding user-defined port types that can be used on both the Java and Python side is something that would be nice to have, […]

Supporting more data types that are stored inside the input/output tables is something we are currently working on, and images are next up on our list. So image output from Python nodes will be possible soon […]

I think you mainly understood what I was looking for. As Apprise supports the attachment/embedding of images for all notification services that allow to handle images I thought about delivering an image over an image (e.g. PNG) port. Then it would be possible to embed generated images from other nodes and further improve usability of the push notifications. As you now mention image output from Python nodes does that include image input to Python nodes or will this be a one-way road (for the moment)?

Fully functional FlowVariable ports

What exactly are you missing here? We did not add an explicit FlowVariable port on the Python side because all nodes do have an implicit FlowVariable port, and each Python node provides the incoming flow variables in the flow_variables dict. But you are right, there is no way yet to enforce the visibility of the FlowVariable port of the node.

At least I obviously misunderstood

Currently, a Python node is only able to access flow variables that have been propagated to it via a table/binary input port, as opposed to a dedicated flow variable port. (Source)

in a way that the only option to get FlowVariables into a Node are the table/binary input ports. Of course I am aware of the implicit FlowVariable ports but thought about them being dysfunctional for the Python Nodes.

Last but not least and fairly obvious a better (full) integration of the FlowVariables for configuring nodes would be great for more versatile configuration options

[…]
If you are looking for ways to override variables of your Python nodes via flow variables, this is possible by right clicking the node and selecting the context menu item “Configure Flow Variables”.

That’s exactly what I was looking for. To be honest, although I use KNIME for more than 10 years now, I have never realized or even used the “Configure Flow Variables” context menu as the FlowVariables are typically exposed in the configuration dialogue.

As I wasn’t aware of this possibility I hard-coded the variable name which holds the config. With this being available it might be possible to merge three nodes into one (given a multi-line input field being available… ) in the future…

IMHO at least the first point could/should be clarified in the documentation - about the latter I am not sure. How long is this context menu option even available?

As said I am looking forward to the future development of this exciting new ability but for the moment I am pretty happy with the Apprise nodes which I envisioned for almost two years now but a library such as Apprise is quite unique as far as I know and I am pretty sure that holds for (many) other libraries of the Python ecosystem which can now be made available for the average (i.e. non scripting) users.

Cheers
Lars

carstenhaubold · October 6, 2022, 6:52am

Hi Lars,

I’m glad to hear that I could answer some of your questions already.

As you now mention image output from Python nodes does that include image input to Python nodes or will this be a one-way road (for the moment)?

We are working on full type support for the Python nodes, which means there can be columns with images in input and output tables of Python nodes and they are readable/writable in Python. This also applies to the Python Scripting (Labs) nodes.

IMHO at least the first point could/should be clarified in the documentation - about the latter I am not sure. How long is this context menu option even available?

Agreed, we could explain a little better where the flow variables come from and how they can be passed to Python. However, this is actually nothing special to Python nodes, the flow variable port behavior is the same for native and Python KNIME nodes.

This “Configure Flow Variable” context menu entry is rather new, it was introduced when adding the View (Labs) nodes with a modern UI. As the Python nodes have a modern UI as well, they share this context menu entry. We’ll mention this in the docs as well, thanks for bringing this to our attention.

As said I am looking forward to the future development of this exciting new ability but for the moment I am pretty happy with the Apprise nodes which I envisioned for almost two years now but a library such as Apprise is quite unique as far as I know and I am pretty sure that holds for (many) other libraries of the Python ecosystem which can now be made available for the average (i.e. non scripting) users.

That was exactly what we were hoping for when developing the infrastructure for Python KNIME nodes It’s great to hear that you feel the same way, and even better that you were able to develop nodes that you were envisioning for some time. We’re looking forward to more nodes that you were dreaming of

Cheers,
Carsten

PTDataScientist · October 11, 2022, 1:35pm

Hi Carsten,

one last comment regarding the flow variables: Yes - it is nothing special in how the flow variables are coming into the Python nodes but the documentation at least suggests that the implicit flow variable ports are not working.

And a very minor addition in the documentation (see my suggestion in bold) should completely avoid the misunderstanding that I had.

Currently, a Python node is only able to access flow variables that have been propagated to it via a table/binary input [port] or the implicit flow varible port as opposed to a dedicated flow variable port.

Cheers,
Lars

system · January 9, 2023, 1:35pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.