KNIME "Python Script" Node slow compared to community node

I am finding that passing data from KNIME noes into the "Python script (1=>1)" is very slow and scales very poorly above 100,000's of rows.

There are at least two previous reports of same issue that have some history to older versions:

https://tech.knime.org/forum/scripting-integrations/is-there-a-way-to-speed-up-passing-data-tofrom-python-nodes'

https://tech.knime.org/forum/knime-users/knime-data-transfer-to-python-seems-slow

Configuration:

KNIME 3.4.0

Anaconda Python 2.7

Concrete example:

Create data with "Python Source" node and code:

   from pandas import DataFrame
   import numpy as np

   output_table = DataFrame(np.ones((100000,3)))

Pass into a default "Python scipt (1=>1) - run time =20s

Pass into defgault community "python snippet" - run time = 2s

The KNIME python nodes scale very poorly with data size, I suspect a data frame is being grown inefficiently via a series of cancatenations.

Any ideas?  Really want this resolved.

Thanks

 

1 Like

Hi there. We’re working on streamlining that process. You might be interested to try the new Python Labs extension which has made some initial speed improvements (with more being worked on currently).