Performance / speed issue with pandas 1.5.0

Hi,

I have a python script node with a fairly large input table.
When I did run this node, the processing / run time was around 20 seconds, and most of the time was the initialization (transferring the inputs from KNIME to the python script), not the the script itself.
Previously I have used pandas version 1.4.4.
But now I have updated pandas to the new 1.5.0 version, and the running time of this script has changed dramatically, it became more than 10 minutes!
(And again, from the logs it looks like that most of the time was the initialization, the running time of the script itself is around one second…)
Have anyone experienced similar issue? Is there something I can do to fix this?
(Downgrading back to pandas 1.4.4 solves the problem, of course…)
I am using KNIME Analytics Platform 4.6.2 with all the latest upgrades (but I have tested version 4.6.1 as well, the issue is the same), the OS is Ubuntu 22.04.

Best regards,
Csaba

@kormoczi you might want to try the Columnar Storage to communicate between KNIME and Python.

Other option would be to use an external file (database) to bring data in and out (maybe not that elegang but still possible).

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.