Python Streaming

Hello!

So I’ve made a chain with multiple components that execute a python script node. What I’m doing is passing some numpy arrays and performing some operations on them from script to script.

For performance purposes I save these arrays in binary format because it is way faster than passing them as KNIME tables. I don’t mind the benefits of consistency checks on the data, I am just trying to increase the speed.

Are python nodes streamable or maybe is there a way to disable these consistency data checks or do something else to increase performance?
Loading and saving the numpy arrays seems like a bad practise to implement a whole chain, when for example you make care only for the resulting numpy array.

Thanks!

Hi vasichar11,

I am glad that you are working with these nodes! Unfortunately, Python nodes are not streamable. Is there a way to have the scripts only within one Python Script node?

Best regards
Steffen

No unfortunatelly my chain is made of many python sripts. But what you are saying is that grouping code into less python script nodes will have a noticeable performance improvement?

@vasichar11 you could try and use the new Columnar backend to faster transfer data between knime and python.

2 Likes

As you already pointed out, the data transfer between the nodes takes time. So yes, if there is less conversion and read/write between nodes, there should be some performance gain. Maybe you could test some truncated Python scripts in one node against their original part in the workfklow with the Timer Info to see performance changes.
Additionally, mlauber71 has a very valid point with the Columnar backend.

Yes I am already using the columnar back end and the performance gains are substantial, thanks.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.